From:  Molly  Bragg  <mbragg@archlve.org> 
Subject:   Re:  [Info]  Your  Archive 

Date;   April  25,  2006  10:57:46  AM  PDT 
To:  Mike  Anderson  <k8iw@rrohio.com> 
Cc:   archive-it@archive.org,  info@archive.org 


Hi  Mike, 

Thank  you  for  your  support!!  This  article  is  quite  a  flashback  for  us  -  thanks  for  sending  it. 

best, 
Molly 


Molly  Bragg 
Partner  Specialist 
Archive -it 

mbraaa@archive.ora 
415.561.6799  ext.6# 


On  Apr  19,  2006,  at  10:12  AM,  Mike  Anderson  wrote: 
Hi, 

I  am  sort  of  an  amateur  Internet  archiver  myself.  In  fact  I  started  before  there  was  a  public  Internet,  saving  electronic  copies  of  US  News  and 
World  Report  from  CompuServe.  But  by  1996,  I  was  saving  news  articles  and  technical  articles  from  the  NY  Times.  This  week  I  started  trying  to 
organize  some  of  them,  and  I  found  one  from  June  14,  1996  that  was  prophetic!  (below). 

So  I  stopped  by  your  site,  which  is  still  around,  I  am  happy  to  say.  I  just  think  what  you  are  doing  is  wonderful.  I  will  join  and  support  you  as  I 
can.  My  best  wishes. 

Mike  Anderson 
Hilliard,  Ohio 

June  14,  1996 

Project  Aims  to  Archive  the  Entire  Internet 
By  LAURIE  J.  FLYNN 

There's  a  certain  symmetry  to  the  fact  that  Brewster  Kahle  chose  a  historic  site  in  San  Francisco  as  office  space  for  his  new  venture.  The 
Internet  Archives.  After  all,  Kahle,  a  computer  scientist,  has  adopted  the  role  of  chief  curator  of  the  wortd's  digital  history. 

Last  week,  Kahle,  a  35-year-old  entrepreneur,  officially  launched  his  labor  of  love:  nothing  short  of  establishing  a  permanent  record  of  the  entire 
Internet.  His  ambition  is  to  create  a  cultural  time  capsule  that  will  document  the  earty  days  of  the  digital  revolution,  preserving  it  in  a  digital  library 
that  he  will  make  available  as  a  public  resource. 

"There's  something  very  important  going  on,"  Kahle  told  friends  and  family  at  the  official  kick-off  of  the  company,  which  has  its  offices  on  the 
newly  converted  Presideo  Army  base  overlooking  the  Golden  Gate  Bridge.  "The  stuff  that's  going  on  in  the  digital  domain  now  is  our  cultural 
history." 

Looking  back  someday,  he  predicted,  "We'll  have  a  very  good  idea  of  what  the  late  20th  century  was  like." 

The  need  for  an  ongoing  record  of  the  Internet  has  become  sort  of  a  battle  cry  of  the  digerati  in  recent  months,  particularly  as  it  has  become 
clear  just  how  fast  the  Net  is  expanding.  Estimates  vary  widely  on  how  fast  it  is  changing,  though  anyone  who  has  used  the  Web  knows  that 
new  sites  come  and  go  faster  than  TV  sitcoms.  And  even  if  a  Web  site  endures,  old  pages  are  often  purged  from  servers  to  free  up  precious 
space. 

"The  Net,  for  all  intents  and  purposes,  is  completely  different  today  from  what  it  was  a  year  ago,"  Kahle  said.  "It's  gone.  Everyone  out  there  is 
pushing  to  the  future." 

Kahle  compares  the  Net  today  to  the  early  days  of  television,  particularly  as  it  relates  to  major  political  events.  "Eariy  television  just  evaporated," 
he  said.  "We  don't  even  know  what  it  looked  like.  It  would  be  great  to  see  today  what  campaign  commercials  were  like  in  1950." 


But  to  create  such  an  archive  is  a  project  of  untold  proportions,  Kahle  concedes.  So  far,  he  has  fincanced  the  project  himself,  using  part  of  the 
fortune  he  amassed  when  he  sold  his  Web  publishing  company,  WAIS  Inc.,  to  America  Online  last  year.  Eventually,  he  may  add  additional 
investors. 

The  goal  is  to  create  a  new  breed  of  products  for  mining  terrabytes  of  data.  Before  creating  WAIS  in  1992,  Kahle,  a  computer  scientist,  helped 
found  the  Thinking  Machines  Corporation,  creator  of  powerful  supercomputers.  It  was  at  Thinking  Machines  that  he  first  began  tackling  the 
question  of  how  to  manage  huge  volumes  of  data  and  make  it  usable  by  people. 

But  Kahle  doesn't  plan  to  achieve  his  goal  of  archiving  the  Net  entirely  on  this  own.  Rather,  he's  accepting  help  and  donations  wherever  he  can 
find  them.  As  of  last  week,  he  and  the  five  members  of  his  staff  had  finished  archiving  the  text  of  the  Web,  essentially  by  working  with  a 
donation  of  the  data  from  a  Web  crawler  company. 

Kahle  said  he  hoped  to  entice  others  to  donate  their  own  archives  with  the  promise  that  they  will  be  stored  permanently.  He  hopes  to  have  a 
copy  of  the  entire  Net,  including  Web  images,  Usenet  and  gopher  sites,  by  the  end  of  the  summer. 

The  company  is  also  working  with  the  Smithsonian  Institution  to  collect  Presidential  Web  sites,  a  project  that  will  result  in  an  exhibit  at  the 
American  History  Museum  focusing  on  the  Web's  impact  on  the  1996  election. 

And  then  the  hard  part  will  begin. 

At  that  point,  he  said,  the  company  will  start  working  on  providing  public  access,  clearly  even  a  thornier  issue  than  amassing  the  data.  Kahle 
says  he  is  working  with  the  major  policy  makers  and  experts  on  intellectual  property,  including  law  professors  at  both  Stanford  University  and  the 
University  of  California,  to  help  understand  the  scope  of  the  copyright  issues  the  company  will  soon  face. 

Privacy  concerns,  too,  will  no  doubt  arise  as  the  company  attempts  to  change  "a  medium  that's  assumed  ephemeral  into  an  enduring  one," 
Kahle  said.  The  Internet  Archives  will  consist  of  essentially  two  companies:  The  archives  themselves  will  reside  in  a  not-  for-profit  trust,  while 
Kahle  and  his  colleagues  will  also  develop  software  for  managing  huge  amounts  of  Internet  data.  That  software  will  eventually  be  packaged  and 
sold  commercially  for  use  with  Intranets  and  other  large  sites,  though  Kahle  has  no  specific  time  frame  yet  for  doing  so.  The  goal,  he  said,  "is  to 
create  a  new  breed  of  products  for  mining  terrabytes  of  data." 

Kahle  concedes  that  not  everybody  understands  the  importance  of  recording  the  Net  as  a  sort  of  historical  artifact,  and  he  admits  that  many 
people  look  at  him  like  he's  crazy. 

"They  either  say,  'How  could  you  possibly  do  that?,'  or  'Why  would  you  want  to?' " 

Kahle  answers:  "The  idea  is  to  have  an  impact." 

Copyright  1996  The  New  York  Times  Company 
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Technically  Speaking 

The  Analogness  off  Mayberry  Memories  by  Andrew  K.  Pace 


I  sat  down  to  write  this  with  a 
heavy  heart,  as  one  of  my  heroes 
had  just  died — Don  Knotts,  bet- 
ter known  as  Deputy  Barney  Fife 
on  The  Andy  Griffith  Show,  or  to  the 
less  culturally  literate,  Mr.  Furley 
from  Threes  Company. 

Losing  Barney  was  like  losing  a 
family  member.  You  see,  I'm  a  child 
of  the  70s,  and  for  many  of  us  that 
meant  eating  meals  in  the  warm 
glow  of  the  television  instead  of 
around  a  table.  1  spent  literally  every 
evening  with  Dan  Rather  on  the  CBS 
Evening  News  and  reruns  of  Andy 
Griffith.  The  impact  has  been  long- 
lasting. 

The  life  lessons  I  learned  grow- 
ing up  revolved  around  references 
to  Mayberry  and  its  inhabitants — a 
simple,  straightforward  world  in 
which  you  could  learn  life  lessons 
in  25  minutes.  In  that  simple  world, 
books  were  black-on-white  and  read 
left-to-right.  I  doubt  that  anyone 
in  a  town  such  as  Mayberry  could 
have  imagined  an  electronic  book; 
they  didn  t  even  have  phones  with 
dials.  And,  I  predict,  the  Mayberry 
way  will  persevere  as  the  preferred 
method  for  consuming  books  for  a 
long  time  to  come. 

Now,  I  am  almost  famous  for  my 
wildly  wrong  calculations  about 
the  future  of  e-books.  But,  May- 
berry memories  notwithstanding,  it 
seems  to  me  that  ever  since  Google 
announced  intentions  to  digitize 
the  print  world  (with  the  help  of 
libraries),  the  print  world  has  grown 
less  skeptical  of  digitizing  books. 


ANDREW  K.  PACE 
is  head  of  information 
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Lots  of  companies  are  now  rushing 
in  where  they  once  tread  only  with 
great  trepidation. 

Some  hardware  prognosticators 
are  calling  2006  the  year  of  the  Tab- 
let PC.  More  computer  than  a  Palm 
or  Blackberry  and  lighter  than  most 
laptops,  the  Tablet  is  a  fully  pow- 
ered PC  with  the  added  benefit  of  a 
writing  stylus  and  several  supported 
applications  that  enable  handwrit- 
ten text  and  drawings.  My  library 
loans  Tablets  to  patrons  as  part  of  its 
laptop-lending  program,  and  I  have 
been  a  Tablet  devotee  for  almost 
three  years.  I  am  now  on  my  second 
model,  the  IBM  Thinkpad  Tablet. 
I  have  even  handwritten  parts  of 
columns  using  its  text-conversion 
capabilities. 

Bill  Gates  has  a  lot  riding  on  the 
Tablet  and  has  made  predictions 
even  wilder  than  the  ones  I  used 
to  make  about  e-books.  Neverthe- 
less, the  Tablet  is  cool  technology. 
I  should  be  getting  them  for  free, 
given  the  amount  of  buzz  marketing 
in  which  I  have  engaged  in  airports, 
at  work,  and  at  conferences. 

Reading  between  the  pixels 

I  used  to  keep  track  of  what  boiled 
down  to  three  business  models  for 
e-books  (four,  if  you  count  "free" 
as  a  business  model,  which  most 
vendors  do  not).  HarperCollins  has 
a  plan,  though,  to  keep  its  e-book 
costs  down  in  much  the  same  way 
that  newspapers  remain  cheap.  The 
book  publisher  is  offering  a  pilot 
program  of  ad-supported  e-books 
that  are  free  to  readers.  In  essence, 
out  of  fear  that  online  content  will 
cannibalize  print  sales,  Harper- 
Collins is  looking  to  fill  the  profit 
gap.  It  plans  to  try  the  advertising 
model  with  nonfiction  and  reference 
works,  apparently  beheving  that  fic- 
tion and  ads  might  not  make  good 
bedfellows. 


With  plans  to  start  digitizing  its 
back  catalog,  HarperCollins  seems 
to  believe  that  most  users  will  not 
read  entire  books  online  (although 
I  would  dispute  this  assumption, 
especially  given  the  looming  ubiq- 
uitousness  of  wireless  network 
access),  but  will  be  persuaded  to 
buy  the  print  version  once  online 
excerpts  convince  them  there's  con- 
tent worthy  enough  to  be  bought.  If 
proven,  this  model  will  break  down 
publishers'  fears  that  e-sales  will 
cannibalize  print  sales. 

No  pulped  fiction 

Is  it  oxymoronic  to  call  a  prepub- 
lished  e-book  a  "preprint "?  Well, 
Safari  Books  is  taking  a  stab  at  pre- 
prints in  a  post-print  world  with  the 
release  of  Rough  Cuts,  thus  provid- 
ing access  to  books  as  they  are  being 
written.  Rough  Cuts  purchases  al- 
low unhmited  access  to  manuscripts 
in  progress  as  well  as  the  finished 
product  in  print.  This  service  could 
prove  especially  useful  for  technolo- 
gy titles  that  seem  to  go  out  of  print 
as  soon  as  they  are  in  print. 

The  Open  Content  Alliance  is 
also  gearing  up  for  some  rough  cuts 
of  its  own — by  way  of  OCA  negoti- 
ating author-  and  publisher-granted 
prepublished  rights  to  electronic 
books  (AL,  Jan.,  p.  77).  Under  some 
unofficial  leadership  from  the  In- 
ternet Archive's  Brewster  Kahle  and 
Rick  Prelinger,  the  OCA  is  planning 
a  major  proof-of-concept  launch  in 
October. 

Several  working  groups — on 
topics  including  metadata,  preser- 
vation, and  data  transfer  protocol — 
have  already  formed  to  facilitate  the 
more  measured  approach  to  such  an 
undertaking. 

OCA  is  one  of  the  few  groups  tak- 
ing an  organized  stand  against  more 
restrictive  digital  rights  manage- 
ment, the  killer  of  e-book  adoption. 
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GALE  DIGITAL  COLLECTIONS 

The  future  of  historical  research 


Eighteenth  Century  Collections  Online       \ 

The  most  comprehensive  archive  of  its  kind  delivers  more 
than  138,000  printed  works  from  the  Age  of  Reason. 


American  History  and  Culture  Online: 
Sabin  Americana,  1500-1926 


This  collection  of  important  and  many  hard-to-find 
primary  sources  opens  a  window  onto  the  society,  politics, 
culture,  religious  beliefs  and  contemporary  opinions  both 
at  home  and  abroad. 

Race,  Slavery  and  Anti-Slavery  Online  \ 

Designed  to  provide  easy  access  to  the  continually 
emerging  scholarship  on  historical  slavery  and 
antislavery  literature. 

Nineteenth  Century  Newspapers  \ 

Access  more  than  1 .5  million  pages  of  full-text  primary 
source  content  and  images  from  newspapers  from  urban 
and  rural  regions  throughout  the  U.S. 

The  Times  Digital  Archive,  1785-1985  | 

From  The  Times  (London)  comes  the  most  important 
news  and  commentary  of  the  last  200  years. 

Also  available  in  the  Qale  Digital  Collections: 

The  Making  of  the  Modern  Economy 

Making  of  Modern  Law 

Making  of  Modern  Law: 
Supreme  Court  Records  and  Briefs 


:ontact  your  Thomso 


^800-877-GALE 
www.qale.com 
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Ironically,  some  of  OCA's  founding 
members  have  played  major  roles  in 
restrictive  DRM,  a  topic  ripe  enough 
for  a  future  column. 

I'm  not  sure  Deputy  Fife  would 
have  gone  in  for  e-books;  he  never 
seemed  to  like  change  very  much. 
The  dedicated  deputy  left  the  show 
in  1965 — coincidentally  as  it  started 
filming  in  color.  Opies  voice  was 
changing  and  he  was  dancing  to 
rock  'n'  roll,  and  Aunt  Bea  had  be- 
come a  women's  libber  by  1960s 
North  Carolina  standards;  things 
would  never  be  the  same. 

Life's  lessons  are  easier  to  fol- 
low in  black  and  white  (for  TV  and 
books)  and  are  often  more  palatable. 
Can  e-books  change  that?  I  like  to 
think  there's  a  reason  that  the  sim- 
pler tiiTies  are  always  behind  us. 

OPEN  SOURCE  WATCH 

M  The  OpenReader  Consortium 

is  a  nonprofit  organization  develop- 
ing open  digital  publication  stan- 
dards. An  attempt  to  combat  the 


format  wars  in  the  e-book  and  digi- 
tal publishing  space,  OpenReader 
could  present  a  better  future  for 
both  publishers  and  consumers.  It  is 
already  endorsed  by  several  e-con- 
tent  vendors  and  providers.  More 
information  is  available  at  www 
.openreader.org. 

QUICK  CLICKS 

Announcements 

Follett  has  introduced  Destiny 
Asset  Manager  for  K-12  school 
districts.  Using  a  browser-based  in- 
terface, the  software  tracks  fixed  and 
portable  assets,  including  detailed 
information  and  even  a  picture  of 
the  asset.  Destiny  Asset  Manager 
is  part  of  FoUett's  larger  Resource 
Management  suite. 

ProQuest  Information  and 
Learnmg  has  appointed  Simon 
Beale  as  senior  vice  president  of 
global  sales.  Beale  joined  ProQuest 
in  2002  and  took  on  responsibility 
for  international  business  develop- 
ment initiatives  in  2004.  The  an- 


nouncement comes  on  the  heels  of 
ProQuest's  disclosure  of  material  ac- 
counting irregularities,  encouraging 
the  company  to  reissue  its  financial 
statements  as  far  back  as  six  years. 
In  a  February  9  press  release,  the 
company  stated  it  "believes  that  the 
accounting  irregularities  do  not  af- 
fect the  company's  cash  balances, 
the  amounts  invoiced  to  custom- 
ers, cash  receipts  from  customers, 
or  disbursements  to  publishers  and 
suppliers." 

Ask.com  has  asked  a  librar- 
ian for  help — with  the  appointment 
of  Gary  Price  as  director  of  online 
information  resources.  Price,  a 
current  editor  of  the  popular  Re- 
sourceShelf  (www.resourceshelf 
.coin),  former  editor  of  the  online 
Search  Engine  Watch,  and  frequent 
speaker  on  search  engine  coiupa- 
nies  and  technologies,  will  lead 
Ask.com's  outreach  efforts  within 
the  library  and  educational  com- 
munities and  play  a  role  in  new 
product  development.    [3 
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PROFESSIONAL  DEVELOPMENT 


Working  Knowledge 


A  Tale  off  Two  Librarians 


by  Mary  Pergander 


Ql  have  been  in  my  current 
positionjor  five  years.  1  do  a 
good  job,  and  I  accept  extra 
work.  1  keep  waiting  for  my 
manager  to  recognize  this  and  promote 
me,  but  it  seems  my  work  is  not  vahied. 
How  much  longer  should  1  stay  before 
looking  for  a  better  opportunity  else- 
where? 

You  seem  to  be  following  a  very 
typical  pathway:  Get  a  job,  perform 
work  well,  expect  to  be  noticed  and 
rewarded.  This  is  common,  but  often 
not  effective. 

Nationally  recognized  career  strate- 
gist Adele  Scheele  has  written  about 
"the  good  student"  syndrome.  She 
describes  individuals  who  practice  the 
behaviors  that  worked  well  in  school; 
doing  your  best,  then  waiting  passively 
for  recognition,  approval,  and  promo- 
tion. In  the  workplace,  this  can  result 
in  great  frustration  when  that  expected 
recognition  fails  to  come.  According 
to  Scheele,  such  employees  "become 
resentful  for  not  being  selected  when 
others  who  have  done  less  or  are  new- 
er are  given  opportunities  instead." 

Stop  waiting!  Have  you  let  your 
manager  know  you  are  interested  in 
another  type  of  position?  This  seems 
so  basic,  yet  many  people  overlook 
this  essential  first  step.  "It  should 
be  obvious  to  my  boss,"  some  say. 
Others  recall  telling  their  managers 
during  the  job  interview  that  they  are 
interested  in  future  opportunities, 
but  never  mentioning  it  again! 

Second,  ask  your  manager  what 
skills  you  need  to  develop  and  what 
experiences  you  need  to  have  for  the 
position  you  desire.  Can  you  obtain 
these  in  the  cur- 
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rent  workplace  through  projects, 
committees,  teams,  or  assignments? 
Will  you  also  need  to  enhance  your 
professional  development  through 
seminars  or  classes?  Are  you  willing 
to  do  so?  If  you  follow  through  with 
these  suggestions,  you  may  get  the 
position  you  seek.  You  will  also  be  in 
a  stronger  position  to  find  your  de- 
sired job  elsewhere. 

Ql  am  happy  in  my  current 
position  and  feel  it  is  just 
right  for  me.  However,  my 
new  manager  keeps  asking 
me  to  take  on  special  projects  and  says 
that  1  am  capable  of  much  more.  This 
really  bothers  me.  I  do  not  want  to  take 
on  any  more  than  I  already  have.  How 
can  1  get  him  to  understand  this  with- 
out jeopardizing  his  respect  for  me? 

Let's  examine  some  of  your  appar- 
ent assumptions.  Are  the  special  proj- 
ects not  part  of  your  job?  Most  jobs 
consist  of  varying  degrees  of  routine 
and  nonroutine  activities  and  respon- 
sibilities. The  projects  may  be  non- 
routine  but  necessary  contributions  to 
the  work  of  the  department.  Is  your 
manager  using  projects  to  distribute 
work  throughout  the  department, 
or  more  specifically  to  develop  your 
talents  or  skills?  Try  talking  to  him 
about  this.  He  may  assume  that  you 
want  advancement  or  a  broader  job, 
when  that  is  not  the  case. 

I  would  caution  you,  however, 
about  assuming  that  your  "old"  job 
of  just  doing  the  routine  tasks  you 
enjoy  is  still  feasible  in  this  economic 
climate.  Your  manager  may  see  that 
he  must  demonstrate  increased  pro- 
ductivity of  each  employee  or  risk 
losing  staff. 

Is  there  a  particular  reason  you  do 
not  want  additional  opportunities 
right  now?  A  serious  illness,  challeng- 
ing family  situation,  or  other  stresses 
can  all  make  it  temporarily  difficult  to 
commit  to  more  professionally.  When 
this  is  the  case,  it  can  be  helpful  to 


explain  this  to  your  boss.  If  he  is  of- 
fering the  opportunities  primarily  for 
your  professional  growth,  he  may  be 
able  to  wait.  However,  if  he  is  reorga- 
nizing the  department  or  trying  to  in- 
crease your  productivity,  waiting  may 
not  be  an  option. 

Finally,  I  would  challenge  you  to 
push  yourself  outside  your  comfort 
zone.  Remaining  attached  to  the  job 
as  it  used  to  be  could  cause  you  to 
suffer,  because  that  job  may  no  longer 
exist.  Find  ways  to  accept  the  oppor- 
tunity to  stretch  yourself.  With  time, 
this  new  job,  too,  may  become  com- 
fortable and  familiar.  Of  course,  just 
about  then  it  will  change  again!    H 


WORKING  WISDOM 


Can  you  identify  with  one  of  the 
scenarios  presented  above?  Are  you 
inadvertently  holding  yourself  back 
by  patiently  waiting  and  silently 
hoping  to  be  recognized?  Or  are  you 
comfortable  in  your  current  position 
and  hoping  to  avoid  change?  Either 
situation  can  be  improved  with  clari- 
fied expectations.  What  next  steps 
should  you  take? 
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How  To:  Podcasting  In  Four  Easy  Steps 

Here  are  four  helpful  steps  for  getting  started  with  your  own  podcast. 

By  Laurie  Sullivan,  TechWeb.com 

April  17,  2006 

URL:  http://www.iriformationweek.com/story/showArticle.jhtml?articlelD=1  85303349, 

Advertisers,  marketers  and  media  relations  are  paying  serious  attention  to  portable  media  as  a  method  to  reach 
audiences. 

Spending  on  advertising  in  blogs,  podcasts,  and  real  simple  syndication  (RSS)  feeds  is  expected  to  rise  145 
percent  to  $49,8  million  this  year,  according  to  a  recent  report  by  PQ  Media  LLC,  The  research  firm  said 
music,  comedy,  and  science  and  technology  combined  hold  27.3  percent  share  of  downloads. 

Rather  than  dig  through  the  mounds  of  information  on  the  Internet,  here  are  four  helpful  steps  for  getting 
started  with  your  own  podcast. 

1.  Invest  in  a  digital  recorder  or  a  microphone.  A  cheap  one  will  typically  pick  up  background  noise, 
Behringer  offers  a  Studio  Condenser  Microphone  C-1  for  $49.99.  If  money  is  no  object,  look  for  a  large 
diaphragm  condenser  microphone  to  reproduce  a  warm  representation  of  your  voice,  A  good  mic  requires  an 
external  power  source  and  could  set  you  back  up  to  $500. 

Sony  makes  several  digital  recorders,  such  as  the  ICD-BMl,  that  uses  memory  sticks  to  store  audio  files. 
MiniDisc  Players  also  do  the  trick. 

2.  Podcasting  requires  audio  software.  Several  companies  that  develop  open  source  audio  software  offer  the 
basics  for  free.  A  couple  of  these  are  Audacity,  and  WavePad  from  NCH  Swift  Sound. 

3.  If  you  have  video,  edit  it.Some  experts  say  it  takes  more  skill  to  edit  video  than  audio.  Maybe  so,  but 
software  companies,  such  as  muvee  Technologies,  have  made  it  easier. 

Turn  over  the  raw  video  content  to  muvee  autoProducer  5,  set  a  few  options,  such  as  the  video's  run  time,  and 
chose  from  the  package's  style  and  music  content.  Editors  can  save  files  in  MP3,  WAV,  WMA,  and  ACC 
formats.  The  "Creative  Tips  and  Tricks"  section  on  muvee 's  site  walks  you  through  video- editing  techniques,  if 
needed.  Otherwise,  the  software  itself  does  the  rest. 
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4.  Find  a  host.Podcasters  will  need  a  place  to  host  the  audio  files  and  a  Web  site  to  let  listeners  subscribe  to 
the  podcast  through  a  real  simple  syndication  (RSS)  feed,  such  as  Feedburner  Inc.  or  Liberated  Syndication 
Network 

I..SN  charges  $5  per  month  for  100MB  of  file  storage.  There  are  also  sites  that  cost  nothing. 

Ourmedia  hosts  podcasts  for  free.  Drupal,  an  open -source  content  management  platform,  offers  free  hosting 
through  its  Bryght  hosted  service.  Other  sites  also  participate  in  an  open  registry,  storing  material  on  their 
servers.  The  Internet  Archive  has  agreed  to  provide  free  storage  space  and  free  bandwidth  for  media  files 
published  by  Ourmedia  members  forever,  it  says  on  its  Web  site. 


With  these  basics  under  your  belt,  you'll  be  podcasting  in  no  time. 
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Creative  Commons  (CC)  offers  licenses  that  allow  you  to  publish  material  with  clear-cut  licensing  terms 
that  reserve  some  of  your  rights  while  giving  the  public  others.  CC  offers  a  number  of  tools  to  implement 
the  licenses  into  the  metadata  of  various  media  formats.  Until  recently,  its  ccPublisher  program,  which 
allows  you  to  upload  CC-licensed  content  to  the  Internet  Archive,  had  official  binary  releases  only  for 
Apple  Macintosh  OS  X  and  Microsoft  Windows  XP.  This  is  about  to  change,  with  the  upcoming  release  of 
ccPublisher  2. 

The  new  software  puts  an  emphasis  on  cross-platform  compatibility.  The  ccPublisher  team  is  writing 
ccPublisher  2  entirely  in  Python  --  a  free,  open  source,  interpreted  programming  language.  This  allows 
ccPublisher  to  run  on  any  operating  system  with  Python  available  for  it,  which  today  includes  more  than  a 
dozen  systems. 

The  ccPublisher  team  recently  released  its  first  public  beta,  0.9.1.  This  release  is  almost  feature -complete, 
lacking  only  an  installer  component  and  a  crash- feedback  reporter. 

ccPublisher  2  needs  P\'thon  2.4.  wxPython  2.6,  and  Ixml  to  run.  Most  current  Linux  distributions  (including 
Debian)  provide  these.  If  your  distribution  does,  simply  invoke  your  package  manager  to  download  and 
install  the  necessary  libraries.  If  not,  you  can  install  all  three  easily.  Python  uses  the  GNU  Compiler 
Collection  (GCC)  standard  ./configure  &&  make  &&  make  install  from  the  source  directory,  and 
wxPython  and  Ixml  use  the  Python  module  standard  python  setup. py  install. 

Beta  1  offers  a  sparse,  straightforward  interface.  Follow  a  few  prompts  to  select  applicable  files  for  upload, 
enter  metadata  for  the  Internet  Archive  listings,  and  select  a  CC  license  and  the  files'  formats.  The  program 
uploads  all  the  information  to  the  Internet  Archive,  where  it  appears  within  24  hours. 

The  ccPublisher  2  team  plan  several  useful  features  for  the  official  release,  including  complete  conversion  to 
the  more  modular  and  extensible  ccPublisher  2  architecture,  support  for  extensions  and  plugins,  and  easy 
customization  for  third-party  developers. 

The  ccPublisher  2  developers  have  already  set  their  sights  on  version  1.1,  which  they  say  will  embed  CC 
metadata  into  numerous  formats  and  offer  full  support  for  localization.  They  also  hope  to  allow  users  to 
extract  existing  metadata  from  the  media  files  themselves,  making  the  process  much  faster,  particularly 
when  uploading  a  large  number  of  files. 

The  current  ccPublisher  2  beta  release  is  a  useful,  yet  somewhat  limited,  application  for  Internet  Archive 
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publishing.  But  if  its  development  road  map  is  an  indicator,  ccPublisher  2  is  set  to  become  an  invaluable 
tool  in  the  future. 

Links 


1.  "Creative  Commons"  -  http://creativecommons.org/ 

2.  "Internet  Archive"  -  http://www.archive.org/ 

3.  "ccPublisher  2"  -  http://wiki.creativecommons.org/CcPublisher_2_Releases 

4.  "Python  2.4"  -  http://www.python.org/ 

5.  "Ixml"  -  http://codespeak.net/lxml/ 
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Internet  Archive's  value,  legality  debated  in  copyright  suit 

30E  MANDAK 
Associated  Press 

PHILADELPHIA  -  An  ongoing  lawsuit  between  a  company  and  a  popular  archive  of  Web  pages  raises  questions  about 
whethier  the  archive  unavoidably  violates  copyright  laws  while  providing  a  valuable  service,  according  to  attorneys  and  an 
independent  law  expert. 

The  San  Francisco-based  nonprofit  Internet  Archive  was  created  in  1996  to  preserve  Web  pages  that  will  eventually  be 
deleted  or  changed.  More  than  55  billion  pages  are  stored  there. 

A  health  care  company  claims  the  archive  didn't  do  enough  to  protect  copyrighted  information  that  helped  a  competing  firm 
win  a  trademark  suit. 

The  archive  "is  just  like  a  big  vacuum  cleaner,  sucking  up  information  and  making  it  available"  to  anyone  with  a  Web 
browser,  said  Scott  S.  Christie,  an  attorney  representing  Healthcare  Advocates  Inc. 

"That  has  some  social  value,  but  in  doing  so  they  are  grabbing  information  that  they're  not  entitled  to,"  he  said.  "More 
importantly,  they  are  telling  people  that  they  will  take  it  off  the  shelf  if  you  do  a  certain  thing  a  certain  way  -  but  that 
didn't  happen  in  this  case." 

Carnegie  Mellon  University  computer  science  professor  Michael  Shamos,  an  expert  in  Internet  law,  said  archiving  like  that 
done  by  the  Internet  Archive  is  "the  biggest  copyright  infringement  in  the  world,"  but  said  it  is  done  in  a  way  "that  almost 
nobody  cares  about." 

Shamos  said  Web  site  publishers  typically  don't  mind  that  their  sites  wind  up  on  the  Internet  Archive,  because  the  whole 
point  of  posting  Web  sites  is  to  get  as  many  people  as  possible  to  see  them.  The  rub  is  that  a  Webmaster  loses  control 
over  the  site,  because  the  Internet  Archive  keeps  that  information  on  the  Web  even  after  the  page  is  dismantled,  Shamos 
said. 

Copyrights  are  only  effective  if  the  holder  is  vigilant  about  maintaining  control  of  the  material,  Shamos  said. 

"That's  the  thing  about  rights,  you  have  to  exercise  them.  If  Pamela  Anderson  wants  to  trespass  on  my  front  lawn,  it's  OK 
with  me,"  Shamos  said. 

The  plaintiff  in  the  lawsuit,  filed  in  U.S.  District  Court  in  Philadelphia  last  year,  wasn't  OK  with  how  a  competitor's  attorneys 
used  their  archived  Web  site. 

In  2003,  Healthcare  Advocates  Inc.  filed  a  lawsuit  claiming  a  similarly  named  firm  stole  its  trade  secrets  from  copyrighted 
brochures. 

The  defendant's  law  firm  used  the  Internet  Archive  to  access  old  versions  of  the  Healthcare  Advocates  Web  site.  The  law 
firm  won  the  suit  after  it  showed  some  of  the  contested  information  wasn't  secret  at  all  because  it  had  been  spelled  out  on 
Healthcare  Advocates'  Web  site. 

Healthcare  Advocates  then  sued  the  Internet  Archive.  It  alleged  the  nonprofit  failed  to  protect  that  information  after 
Healthcare  Advocates  asked  the  archive  how  it  could  restrict  access  to  certain  files. 

Stefani  Shanberg,  an  attorney  for  the  Internet  Archive,  said  Web  page  owners  can  ask  that  information  be  removed  from 
the  archive  and  can  keep  the  archives  from  grabbing  it  in  the  first  place. 

"We  voluntarily  inform  Web  site  owners  that  they  can  voluntarily  restrict  access  to  their  material,"  Shanberg  said.  "The 
archive  shouldn't  have  been  dragged  into  this  (lawsuit)  in  the  first  place." 

Federal  copyright  laws  have  exceptions  designed  to  protect  search  engines  and  online  archives  from  such  lawsuits.  And 
Shamos  said  that  copyrights  aren't  violated  when  attorneys  are  digging  for  the  information  in  defense  of  a  lawsuit 

"The  needs  of  discovery  in  litigation  trump  the  copyright.  You're  making  one  copy  for  use  in  court,"  Shamos  said. 

But  Christie  said  the  archive  didn't  do  enough  to  protect  Healthcare  Advocates  Web  pages  from  prying  eyes. 

"I  think  Internet  Archive  does  a  fine  job.  I  think  they're  a  valuable  public  resource,"  Christie  said.  "I  just  take  issue  with  the 
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way  they  perform  their  public  service." 
ON  THE  NET 

http://v>/ww. archive,  org 

http://www.  health  careadvocates. com 
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Ephemeral  films,  resorrected  oe  the  Web 

By  Jim  Regan  I  csmonitor.com 

HALIFAX,  NOVA  SCOTIA  -  Given  the  ephemeral  nature  of  the  Web,  it  can  be  interesting  to  note  just  how 
much  of  the  ephemeral  is  actually  being  preserved  and  (perhaps  as  important)  made  widely  available  by  the 
Web.  From  long -forgotten  product  packaging  and  period  postcards  to  family  ph<:)tos  and  even  East  German 
Paper  Shopping  Bai^s.  the  Web  is  jammed  with  items  that  would  have  been  lost  forever  to  attics  and  museum 
warehouses  without  the  advent  of  online  distribution.  Probably  the  most  entertaining  exhibits  in  this  category 
feature  ephemeral  films  -  ads  and  movies  produced  for  a  specific  time  and  purpose,  with  little  or  no  thought 
to  long-term  relevance  or  preservation  -  and  this  week  we  look  at  two  central  repositories  of  the  transitory  in 
motion  pictures.  You  won't  find  "Citizen  Kane"  here,  but  you  already  know  who  Rosebud  is  anyway. 

The  first  of  these  online  resources  is  a  very  recent  addition  to  the  Web,  and  one  which  has  attracted  a  fair 
amount  of  attention  since  its  February  launch.  As  the  name  implies,  the  National  Archives  on  Google  Video 
project  is  a  cooperative  effort  as  the  two  organizations  make  roughly  100  films  available  for  online  viewing 
or  download  -  meaning  that  not  only  can  people  see  the  films  without  making  a  trip  to  the  Archives  in 
Washington,  they  are  free  to  keep  personal  copies  of  anything  that  piques  their  interest.  A  pilot  program, 
there  is  no  official  word  of  more  films  being  added  to  this  first  compilation,  but  talks  of  "exploring  the 
possibilities  of  expanding  the  online  film  collection"  certainly  sound  promising. 

Of  course,  there's  plenty  to  see  even  in  the  preliminary  anthology,  and  heading  the  list  are  15  of  the  more  than 
250  films  made  by  NASA's  Office  of  Public  Affairs  between  1962  and  1981.  These  (mostly)  half-hour 
presentations  include  a  1963  biography  of  John  Glenn,  a  truncated  1969  documentary  about  the  Apollo  1 1 
mission,  and  a  '67  essay  on  the  challenges  of  photographing  the  moon  in  anticipation  of  a  manned  lunar 
landing.  (Hint:  Try  shooting  a  candy  apple  while  you  and  it  are  on  different  cars  on  an  amusement  park  ride.) 
Of  course,  for  dramatic  effect,  one  can't  beat  the  unmistakable  presence  of  Orson  Welles,  who  guides  viewers 
through  the  1975  production,  "Who's  Out  There?"  (And  who  better  to  ask  that  question  than  the  man  who 
panicked  thousands  with  his  interpretation  of  "War  of  the  Worlds"?) 

Although  these  videos  used  the  most  advanced  animations  and  highest  quality  imagery  available  at  the  time, 
they  demonstrate  that  major  progress  has  been  made  in  movie  production  as  well  as  extraterrestrial 
exploration,  but  they  also  convey  the  enthusiasm  and  anticipation  that  accompanied  the  space  program  at  the 
time.  Meanwhile,  for  those  of  us  ...  of  a  certain  age  ...  who  were  occasionally  treated  to  classroom  films  as 
opposed  to  classroom  videos,  the  dated  production  values  will  elicit  a  twinge  of  nostalgia  even  if  we  never 
saw  these  particular  titles. 

For  those  of  a  different  certain  age,  the  United  States  government -financed  United  Newsreels,  created  for 
overseas  consumption  during  World  War  II,  could  possibly  spark  a  few  memories  of  their  own.  Produced  by 
the  Office  of  War  Information,  these  10-minute  films  are  nothing  if  not  resolutely  optimistic  in  their 
depictions  of  Allied  endeavors  -  true  to  the  rules  of  propaganda  on  both  sides  of  any  war.  (In  fact,  the  thought 
occurred  to  me  while  watching  many  of  the  newsreels  that  these  very  scenes  could  have  just  as  easily 
appeared  in  the  propaganda  films  of  the  German,  Italian,  Japanese,  or  Russian  forces.)  Still,  even  a  one-sided 
portrayal  of  events  has  its  own  historical  significance,  and  the  impact  of  much  of  the  footage,  captured  in  the 
field  by  military  combat  photographers,  is  no  less  compelling  for  the  earnest  background  music  and  lack  of 
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journalistic  objectivity. 

Finally,  taking  us  back  to  the  classroom,  a  collection  of  short  films,  produced  by  the  Department  of  the 
Interior  between  1916  and  1970,  introduce  viewers  to  national  parks,  reclamation  projects,  the  "conquest  of 
the  Colorado  River"  (i.e.,  the  Boulder  Dam),  and  efforts  to  reintroduce  "Children  in  the  City"  to  the  great 
outdoors. 

The  videos  themselves  are  embedded  into  browser  pages,  begin  playing  almost  immediately,  and  once 
completely  loaded,  allow  the  viewer  to  skip  ahead  to  any  point  in  the  narrative.  If  you  prefer  viewing  offline, 
or  your  connection  is  slow  enough  that  you'd  rather  have  your  computer  wait  for  the  files  while  you  do 
something  else,  download  options  are  available  for  playback  on  Mac  and  Windows  PCs,  Video  iPods,  and 
Sony  PSPs.  (And  for  the  truly  dedicated  audience,  Google  offers  the  option  of  automatically  playing  all  the 
videos  in  a  given  category  in  an  uninterrupted  multifeature  marathon.) 

While  the  Google/NARA  films  are  only  recent  additions  to  the  Web,  those  familiar  with  the  Internet  Archive 
will  know  that  that  organization's  Moving  Image  Archive  has  been  online  for  years  -  and,  with  material 
coming  from  multiple  sources,  has  both  a  larger  collection  and  a  wider  variety  of  material  on  hand. 

The  'crown  jewel'  of  the  MIA  is  the  Prelinger  Archive,  which  has  been  noted  in  this  space  before,  but  whose 
existence  is  well  worth  mentioning  again.  With  almost  2,000  films  gathered  over  20  years,  the  Prelinger 
Archive  has  a  spectacular  collection  of  video  artifacts  that  range  from  television  commercials,  to  coverage  of 
the  Hindenburg  disaster,  to  the  infamous  cold-war  classic,  "Duck  And  Cover"  -  which  has  been  downloaded 
nearly  a  quarter  of  a  million  times.  From  1940s  social  hygiene  films  (Are  You  Popular?),  to  a  two-part  '50s 
sitcom  showing  young  women  how  to  land  young  men  with  the  increased  consumption  of  electricity,  from 
the  generically  inspiring  Your  Name  Here,  to  the  surreal  Relaxed  Wife  (promoting  the  tranquilizer  Atarax), 
and  the  equally  surreal  Design  for  Dreaming  (a  1956  General  Motors/Frigidaire  production  that  looks  like  a 
cross  between  "An  American  in  Paris"  and  an  outtake  from  "Twin  Peaks"),  the  Prelinger  Archive  presents  an 
almost  inexhaustible  supply  of  both  motion  picture  history  and  amusement. 

But  there's  more  to  the  MIA  than  the  Prelinger  Archive.  More  than  two  dozen  additional  collections  offer 
such  options  as  Open  Source  Movies  submitted  by  the  online  community,  the  Election  2004  Video  Archive, 
and  the  Net  Cafe  and  Computer  Chronicles  television  series.  In  a  theatrical  vein,  there  are  collections  of 
vintage  cartoons  and  movie  trailers,  Cinemocracy  and  Universal  Newsreels  for  additional  wartime  footage, 
and  more  than  600  short  and  Feature  Films  that  have  made  their  way  into  the  public  domain  -  including  such 
titles  as  "Rash  Gordon  Conquers  the  Universe,"  Charlie  Chaplin's  "The  Kid,"  Buster  Keaton's  "Paleface,"  and 
"Night  of  the  Living  Dead."  ("Plan  9  From  Outer  Space"  had  been  posted  here  as  well,  but  it  is  currently 
unavailable  due  to  "issues  with  the  item's  content.") 

As  you  might  have  gathered,  this  week's  article  could  easily  evolve  into  nothing  more  than  a  list  of  lists,  so 
I'll  stop  myself  here  and  just  say  that  this  small  sample  should  give  you  some  idea  of  the  breadth  of  material 
available.  In  terms  of  viewing  options,  the  Moving  Image  Archive  offers  its  content  both  in  streaming  MPEG 
and  RealVideo  formats,  as  well  as  downloadable  RealVideo,  DivX,  and  MPEG  files  of  various  sizes.  As  with 
the  NARA  films,  downloading  and  viewing  offline  will  be  the  only  practical  option  for  dial-up  users,  but  as 
long  as  you're  not  spending  every  waking  hour  on  your  computer,  you  can  probably  schedule  sufficient  time 
for  a  few  downloads. 

After  all,  you  wouldn't  want  to  miss  "Radar  Men  From  The  Moon,"  would  you? 

Full  HTMI.  version  of  this  storv  which  may  include  photos,  grapliics.  and  related  links 
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1906  quake  returns  to  Bay  Area 

By  Mary  Anne  Ostrom 
Mercui-v  News 

EXHIBIT,  ONLINE  ARCHIVE  RE-CREATE  EVENTS  SURROUNDING  DISASTER 

One  hundred  years  after  the  San  Francisco  earthquake,  the  state's  top  historic  libraries  are  unveiling  a  massive  $1  million 
online  archive  that  chronicles  in  amazing  detail  what  happened  on  that  fateful  April  18. 

Thanks  to  the  digital  age,  the  project  brings  the  quake  of  '06  alive  in  ways  never  possible  before.  You  can  read  scribbled 
recollections  as  aftershocks  hit,  hear  voices  of  San  Franciscans  screaming  as  the  big  temblor  relentlessly  shakes,  and  view 
photos  of  what  your  neighborhood  or  town  looked  like  in  April  1906. 

In  one  stream-of-consciousness  letter,  the  author  writes  at  8:15  that  morning  --  three  hours  after  the  quake:  '  'I  lay 
perfectly  quiet  till  I  thought  the  end  was  surely  here." 

The  letter  joins  14,000  other  images  and  7,000  pages  of  text  that  have  rarely  or  never  been  seen  by  the  public. 

Want  to  look  at  hand -colored  maps  of  where  the  Army  Corps  set  off  dynamite  in  misguided  attempts  to  stop  the  post- 
quake  fires?  How  about  actor  John  Barrymore's  story  of  fetching  a  brandy  from  the  Bohemian  Club  for  a  beautiful  quake 
refugee  stranded  in  Union  Square?  Or  a  poster  demanding  that  officials  stop  squandering  Red  Cross  funds  on  automobiles 
and  high  salaries? 

The  online  trove  (http://bancroft.berkeSey.edu/collections/earthquakeandfire/)  took  archivists  and  researchers  from 
six  California  institutions  five  years  to  build.  The  online  archive,  of  which  Stanford  University  Libraries,  among  others,  is  a 
partner,  officially  launches  next  Thursday.  A  companion  physical  exhibit  opened  Wednesday  on  the  University  of  California- 
Berkeley  campus. 

While  San  Francisco  is  the  focus,  the  project  also  depicts  how  the  large  swath  of  Northern  California  fared  after  the  7.8 
quake,  including  the  ensuing  exodus  from  the  city  to  San  Jose  and  Oakland  and  how  the  region  rebuilt  itself. 

Archive  documents  recount  the  rush  from  around  the  world  to  get  aid  to  the  area  --  including  trains  of  condensed  milk  from 
the  Midwest  and  governments  as  diverse  as  China  and  Guatemala  sending  money. 

'  'It  wasn't  just  San  Francisco's  earthquake;  it  was  a  lot  of  people's,"  said  Theresa  Salazar  of  UC-Berkeley's  Bancroft 
Library,  the  project's  curator. 

The  project  also  illustrates  the  daily  grind  of  recovering;  for  example,  a  cookbook  --  50  recipes  for  50  cents  --  was  written 
for  families  who  lost  all  their  recipes.  Other  items  document  the  horrors,  including  rampant  lawlessness,  graft  and  racism 
that  plagued  the  city.  A  section  labeled  '  'Thieves"  includes  a  photo  of  Army  personnel  looting  shoes  on  Market  Street. 

Documents  describe  the  deaths  at  Santa  Clara's  Agnews  state  mental  hospital:   '  'A  calamity  that  buried  beneath  the  ruins 
of  the  collapsed  buildings  the  bodies  of  eleven  officers  and  one  hundred  patients."  It  was  the  largest  loss  of  life  for  a  single 
site  that  day.  And  an  outraged  governor  responded  to  erroneous  newspaper  stories  that  patients  had  died  killing  one 
another. 

On  an  interactive  map  composed  of  several  thousand  photos,  you  can  search  San  Francisco  by  neighborhood  to  pull  up 
damage  photos.  To  create  it,  one  Bancroft  pictorial  archivist,  Chris  McDonald,  pored  over  5,000  photos  assigning 
geographic  locations  to  each. 

The  archivists  also  assembled  an  11-photo,  360-degree  panorama  of  post-quake  San  Francisco  as  it  looked  from  the  roof  of 
Nob  Hill's  Fairmont  Hotel. 

Until  now,  many  of  the  items  were  scattered  statewide  and  only  a  few  were  ever  available  digitally. 

'  'One  of  the  main  reasons  to  put  it  online  is  to  make  it  available  to  almost  everyone,"  said  Mary  Elings,  the  project's 
digital  archivist, 

■  Brewster  Kahle,  digital  librarian  of  the  Internet  Archive,  which  is  not  associated  with  the  project,  said  he  was  fascinated  as 
■he  viewed  photos  of  damage  from  the  Presidio,  where  he  now  works  and  lives. 


When  you  have  something  right  there,  there  is  a  magic  to  it,"  Kahle  said.  '  'It's  not  a  textbook  or  a  documentary.  You 
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I  can  go  look  in  your  own  neighborhood." 

Surprising  to  some,  but  not  to  historians,  is  the  sheer  volume  of  photos  that  remain  from  1906.  The  quake  hit  ' "  right  after 
the  birth  of  Kodak's  roll-film  camera.  San  Francisco  was  a  hotbed  of  photography,"  said  Philip  Fradkin,  a  consultant  to  the 
online  archive,  which  he  also  mined  to  write  '  'The  Great  Earthquake  and  Firestorms  of  1906,"  released  last  year.  Fradkin 
spent  two  years  scouring  archives  and  labeling  photos  '  'until  I  went  cross-eyed." 

Fradkin's  early  research  was  the  seed  for  the  project,  which  was  embraced  by  then-state  librarian  Kevin  Starr,  who 
procured  federal  funding  for  the  online  archive. 

Among  Fradkin's  finds  were  a  collection  of  Jack  London  photos,  held  at  Southern  California's  Huntington  Library.  London 
covered  the  quake  for  Collier's  Magazine,  and  with  wife  Charmian  traveled  widely  taking  pictures  and  writing  about  what 
they  saw.  London's  photos  and  Charmian's  diary  will  be  the  subject  of  a  California  Historical  Society  exhibit  opening  next 
month. 

The  Bancroft  Library  remains  the  biggest  repository  for  1906  material.  Located  on  Valencia  Street  in  San  Francisco  at  the 
time  of  the  earthquake,  it  was  the  only  major  library  in  the  city  to  survive.  The  library  today,  ironically,  is  housed  in  a 
temporary  building  at  its  modern  UC-Berkeley  home  while  it  undergoes  a  seismic  retrofit. 

The  digital  archive  is  expected  to  grow  and  become  a  lasting  repository  of  information  about  what  many  still  consider  the 
worst  urban  disaster  in  U.S.  history,  say  planners. 

'  'We've  had  a  tsunami  and  a  flood.  People  are  more  tuned  in,  I  think,  than  they  may  have  been,"  said  Stephen  Becker, 
executive  director  of  the  California  Historical  Society.  And  even  if  interest  wanes,  he  added,  '  'The  archives  are  not  going 
away." 

Contact  Mary  Anne  Ostrom  at  mostrom@mercurynews.com  or  (415)  477-3794 
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DRM  vs.Open 

Digital  Rights  IVIanagement  for  the  masses  is  a  great 
initiative,  but  buyer  be  aware  that  you  can  sometimes  get 
those  clips  legitimately  for  free. 

By  Rebort 

For  content  creators  Google's  Video 
Marketplace  (see  story)  may  prove  a 
welcome  source  of  revenue.  However,  the 
downside  is  that  some  people  will 
commercialise  what  may  already  be 
legitimately  available  for  download  for  free. 

Some  films  are  already  in  the  public  domain 
or  are  free  to  watch  under  less  restrictive 
licencing  than  on  pay  video  sites. 

For  example,  on  Google  Video  the  Sherlock 
Holmes  film  Dressed  To  Kill  (1  946),  a 
black-and-white  feature  starring  the  stiff- 
uppered  Basil  Rathbone,  is  selling  at  $7.99, 
or  a  day  pass  at  $0.99.  But  you  can 
download  a  good  quality  MpegZ  version 
(the  file-type  used  for  DVDs)  of  the  same 
film  for  free  at  The  Internet  Archive.  If  you 
are  not  prepared  to  wait  the  few  hours  it 
takes  for  the  film  to  download  (this  is  over 
a  fast  connection),  there  are  more 
compressed  files  that  you  can  download 
and  view  on  The  Internet  Archive,  including 
a  tiny  streaming  format. 


Thumbnail  previews  of 
Sherlock  Holmes  Dressed 

To  Kill,  which  you  can 

download  for  free  off  The 

Internet  Archive 


The  non-profit  Internet  Archive  carries  a  huge  amount  of  other  moving 
image  material  -  26,947  video  clips  at  the  time  of  writing  -  ranging  from 
German  artfilms  of  the  Sixties  to  fascinating,  cigarette  advertisements  in 
nostalgic  technicolor.  The  latter  is  part  of  the  Prelinger  Archive,  a 
collection  of  ephemera  like  corporate  videos,  television  spots,  public 
information  films,  and  such-like  collected  by  Rick  Prelinger  over  a  period 
of  20  years.  There's  also  627  movie  features,  including  Frank  Capra's 
Why  We  Fight  series  of  WW2  propaganda  films.  All  are  free  to  download, 
often  in  DVD  quality.  Even  the  amateur  footage  on  The  Internet  Archive 
can  be  compelling  viewing.  Just  check  out  some  of  the  chilling  footage 
uploaded  of  the  Asian  Tsunami. 

The  Internet  Archive  was  founded  in  1 996  by  philanthropist  and  digital 
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guru  Brewster  Kahle.  His  intention  was  to  build  an  internet  library  so  that 
researchers,  historians,  and  scholars  could  access  historical  collections 
that  exist  in  digital  format.  Its  goals  have  changed  as  the  internet  has 
evolved,  with  catalogues  of  most  types  of  media  now  available.  The  site 
is  perhaps  most  famous  for  its  "waybackmachine"  which  carries 
snapshots,  going  back  a  decade,  of  millions  of  web  sites.  The  Internet 
Archive  has  been  one  of  the  forerunners  of  high-resolution  video 
downloads  and  video  hosting.  If  you  don't  need  DRM  then  it's  as  good  a 
place  as  any  to  publish  your  videos  online. 
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Imagine — all  the  world's  information  at  your  service 
with  just  a  few  clicks  of  the  mouse.  It's  a  dream  that 
Brewster  Kahle  has  held  onto  for  the  past  20  years  and 
is  now  seeing  through  to  reality  in  his  role  at  the  inter- 
net Archive,  where  he  serves  as  chairman  of  the  board. 
The  Internet  Archive  was  founded  in  1996  to  build  an 
"Internet  library"  that  will  offer  permanent  access  for 
researchers  and  scholars  to  historical  collections  that  exist 
in  digital  format.  Kahle  is  the  force  behind  that  effort. 

Prior  to  his  work  with  the  Internet  Archive,  Kahle 
pioneered  the  Internet's  first  publishing  system,  known  as 
WAIS  (Wide  Area  Information  Server),  which  was  sold  to 
AOL  in  1995.  He  then  cofounded  Alexa  Internet,  which 
was  sold  to  Amazon.com  in  1999.  Kahle  earned  a  B.S. 
from  the  Massachusetts  Institute  of  Technology  in  1982. 
He  studied  artificial  intelligence  with  Marvin  Minsky 
and  W.  Daniel  Hillis.  In  1983,  he  helped  start  Thinking 
Machines,  a  parallel  supercomputer  maker,  serving  as  a 
lead  engineer  for  six  years. 

Discussing  the  potential  of  the  Internet  Archive  with 
Kahle  is  Stuart  Feldman,  vice  president  of  Internet  tech- 
nology for  IBM.  Before  that,  he  was  director  of  the  IBM 
Institute  for  Advanced  Commerce  and  head  of  computer 
science  research. 

Prior  to  coming  to  IBM  in  1995,  Feldman  spent  1  \ 
years  at  Bellcore,  where  he  held  several  research  man- 
agement positions.  He  spent  10  years  before  that  as  a 
computer  science  researcher  at  Bell  Labs.  Feldman  was  a 
member  of  the  original  Unix  research  team  and  is  best 
known  as  the  creator  of  the  Make  configuration  manage- 
ment system,  as  well  as  the  author  of  the  first  Fortran-77 
compiler. 

Feldman  received  an  A.B.  in  astrophysical  sciences 
from  Princeton  University  and  a  Ph.D.  in  applied  math- 
ematics from  the  Massachusetts  Institute  of  Technolog}'. 

STUART  FELDMAN:  How  is  it  that  you  ended  up  in  this 
most  amazing  role  as  the  digital  librarian  of  the  Internet 
Archive?  You  had  a  string  of  obvious  successes,  making  a 
major  mark  on  a  number  of  companies.  Then  you  made 
this  interesting  apparent  left  turn  into  running  a  unique 
nonprofit  specialized  service. 

24  |une2004  QUEUE 


THE  DIGITAL  ACE 


BREWSTER  KAHLE:  This 
is  all  part  of  one  theme 
that  was  floating  in  the  air 
when  1  was  in  college:  to 
build  a  digital  library.  The 
thing  that  gets  me  springing  out  of  bed  in  the  morning 
and  has  for  the  last  20  years  is  the  idea  that  we  could 
have  universal  access  to  all  knowledge. 

It  goes  back  very  deep  in  the  human  psyche  to  the 


Library  of  Alexandria,  which  was  in  many  ways  the 
culmination  of  the  Greeks'  vision  of  knowledge  as  being 
worthwhile  in  and  of  itself.  The  idea  is  to  take  the  Librarv 
of  Alexandria  another  step  further  and  make  the  pub- 
lished works  of  humankind  accessible  to  everyone,  no 
matter  where  they  are  in  the  world.  We  hope  that  tlien 
everyone  can  add  to  this  grand  library.  Current  comput- 
ers and  the  Internet  are  making  this  conceivable.  This 
seems  to  be  the  opportunity  of  our  time,  in  the  way  that 
the  generation  before  got  to  lay  claim  to  landing  a  man 
on  the  moon.  That  was  something  that  humankind  can 
point  at  for  centuries  as  a  worthwhile  achievement. 
SF:  What  do  you  picture  as  the  content?  You  referred  to 
the  published  literature.  Then  you're  obviously  talking 
about  being  able  to  add  video  literature  or  radio  literature. 
BK:  Humankind  started  recording  things  with  the  Sume- 
rian  tablet,  so  we  might  as  well  start  there.  We're  talking 
about  all  books,  all  music,  all  video,  all  Web  content, 
all  software  ever  produced  that  was  meant  for  any  form 
of  dissemination  or  for  passing  down  from  one  genera- 
tion to  the  next.  It's  not  necessarily  everybody's  musings 
inside  their  heads.  We'll  cut  our  area  into  a  smaller,  more 
manageable  set. 

SF:  So  not  everybody's  laundry  bills  will  necessarily  be 
included. 

BK:  I  don't  think  that's  the  first  order  of  business. 
SF:  This  is  ambitious  enough. 

BK:  Yes,  but  it's  also  quite  doable.  There  are  four  ques- 
tions: Should  we  do  this?  Can  we  do  this?  May  we  do 
this?  And  will  we  do  this? 

The  first  question  of  shoiih}  we  do  this,  I'm  going  to 
take  as  almost  a  postulate  of  yes. 
SF:  Because,  obviously,  not  enough  people  have  taken 
that  as  a  postulate,  since  it  wasn't  being  done  very  effec- 
tively before  you. 

BK:  Yes,  it's  baked  into  the  Enlightenment  era  of  human- 
kind— that  knowledge  is  important  to  fulfilling  our- 
selves as  people  and  for  building  societies  that  grow  and 
prosper.  It's  also  baked  into  the  American  Constitution. 
It's  fairly  fundamental  to  the  Renaissance,  which  is  the 
rebirth  of  the  Greek  ideals. 

I've  grown  up  within  this  idea  that  universal  education 
is  good,  and  that  people,  if  they  can  build  on  the  works 
of  others,  achieve  more.  But  this  approach  is  not  always 
in  favor.  Not  all  times  in  history  encourage  open  societies 
and  open  knowledge. 
SF:  Your  statements  sound  very  American. 
BK:  Absolutely,  I'm  very  American.  I  see  what  we're  doing 
as  being  very  much  in  the  tradition  of  Ben  I-'ranklin's  and 
Carnegie's  vision  of  the  library  system  and  sort  of  the 


Thomas  Jefferson  ideal  of  making  an  educated  populace. 

Then  there  is  the  question  of  "can  we?"  Within  tech- 
nological audiences,  this  is  often  the  issue. 

The  "may  we?"  question  is  legal  and  societal. 


SF:  You're  doing  it,  so  owiously  there  is  a  way.  When  did 
you  decide  you  could  do  this? 
BK:  While  going  to  a  technical  college  in  the  '70s,  it 
became  quite  clear  with  the  advent  of  Moore's  law  that 
you  could  name  the  year  when  all  books,  all  movies,  all 
music  could  be  stored  on  computers. 
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SF:  Presumably,  things  like  the  Internet  added  a  new 
wrinkle  here. 

BK:  How  to  move  around  all  this  information  was  a  piece 
we  were  missing,  and  that's  why  many  of  us  worked  on 
an  open  Internet.  The  storage  looked  like  that  would  all 
be  taken  care  of.  And  the  computation,  no  problem. 

Let's  consider  the  question  of  how  much  information 
there  is.  If  you  break  it  down,  it  turns  out  to  be  not  that 
big  of  a  deal.  The  largest  print  library  in  the  world,  which 
is  the  Library  of  Congress,  has  about  28  million  volumes. 
A  book  is  about  a  megabyte.  That's  just  the  ASCII  of 
a  book,  if  you  put  it  in  Microsoft  Word.  So  28  million 


megabytes  is  28  terabytes,  which  fits  in  a  bookshelf  and 
costs  about  $60,000  right  now.  Storing  books  in  ASCII  is 
no  problem,  and  the  scanned  images  are  more  but  still 
affordable. 

Scanning  books  costs  between  $5  and  $20.  That's  the 
mechanical  cost  if  you  just  wanted  to  scan  a  book  and  end 
up  with  the  images  of  the  pages  at  high  enough  resolution 
that  you  could  print  it  on  a  high-end  laser  printer  so  it 
would  be  a  good  facsimile  at  600  DPI,  color — a  nice-look- 
ing book.  So  books  are  doable,  in  terms  of  technology. 

Now  let's  take  music.  It's  been  estimated  that  there  are 
about  2  to  3  million  albums.  In  terms  of  salable  units — 
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things  that  were  sold  as  either  78s,  LPs,  or  CDs — that's  the 
universe  of  commercial  music.  If  you  do  the  math  again, 
it's  a  few  more  of  your  bookshelves.  So  you're  still  not 
talking  about  anything  daunting. 

If  you  take  movies  and  video,  Rick  Prelinger  |founder 
of  a  film  collection  known  as  the  Prelinger  Archives] 
estimated  that  the  total  number  of  theatrical  releases  ot 
movies  was  between  100,000  and  200,000.  Again  if  you  do 
the  math,  based  on  DVD  quality,  you  come  up  with  low 
numbers  of  petabytes  [one  petabyte  is  1  million  gigabytes]. 
SF:  So,  across  a  society  this  is  not  a  big  deal. 
BK:  Correct,  we  can  afford  this.  The  cumulative  bud- 
gets of  all  of  the  libraries  in  the  United  States  has  been 
estimated  between  $12  billion  and  $24  billion  a  year. 
Interestingly,  between  one-quarter  and  one-third  ol  that 
money  ($3  to  $8  billion)  now  goes  to  publishers'  prod- 
ucts. That's  a  lot  of  money,  and  everyone  gets  a  lot  out 
of  it.  With  new  technology  we  can  multiply  the  effect  of 
our  spending  in  terms  of  serving  the  public  and  reward- 
ing creators. 

SF:  Are  people  going  to  read  books  on  their  computers? 
BK:  For  delivering  public-domain  books,  the  idea  of  read- 


ing on  screens  is  still  far  from  ideal.  We  developed  a  way 
not  only  to  combat  the  need  to  have  a  computer  to  read  a 
book  in  our  archive,  but  also  to  let  people  read  them  the 
old-fashioned  way:  the  Internet  bookmobile.  Our  general 
philosophy  is  to  use  com. modify  components,  so  we  build 
a  bookmobile  that  costs  i  total  of  $15,000  including  the 
ca  r. 

SF:  This  is  a  bookmobile  without  any  books? 
BK:  Without  any  physical  oooks.  It  prints  them  on 
demand.  There  is  a  sateLne  dish  on  top,  a  printer,  a 
binder,  and  a  cutter,  and  ;,'ou  walk  away  with  a  paperback 
of  any  of  the  public-dom;.in  books  available  on  the  'Net. 
SF:  What's  the  incremental  cost  for  a  typical  book? 
BK:  A  lOO-page  black-and-white  book  with  current  toner 
and  paper  costs  in  the  United  States  is  51,  not  figuring 
labor  costs,  rights  costs,  or  depreciation  of  capital.  That's 
an  interesting  number,  because  at  a  buck  a  book,  it  turns 
out  that  for  a  library,  it  could  be  less  expensive  to  give 
books  away  than  to  loan  them.  In  his  book,  Practical  Digi- 
tal Libraries,  Michael  Lesk  reported  that  it  cost  Harvard 
incrementally  $2  to  loan  a  book  out  and  bring  it  back  and 
put  it  on  the  shelf.  This  is  not  figuring  in  the  warehousing 
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costs  and  all  the  building  costs.  This  is  just  the  incremen- 
tal cost  of  loaning  a  book  out. 

Even  if  you  put  some  fee  in  for  the  author,  it  looks  cost 
effective  to  print  and  bind  many  books  locally. 
SF:  So,  running  a  self-service  kiosk  would  be... 
BK:  ...more  cost  effective. 

SF:  .And  you  could  let  people  burn  them  afterward  if 
they  dare.  The  book  would  say,  "Please  do  not  return  this 
book." 

BK:  Or,  "Please  give  it  to  somebody  else."  I  think  we're 
not  setting  ourselves  up  for  tearing  down  forests  by  this 
system  since  people  may  be  more  likely  to  read  books 
they  have  worked  to  print  and  bind.  We  are  trying  to 
avoid  the  inefficiencies  of  the  current  book  distribution 
system,  and  at  the  same  time  offer  a  much  broader  range 
of  books  to  everyone. 

A  year  ago  in  San  Francisco,  we  developed  a  print-on- 
demand  bookmobile.  1  drove  it  across  the  country  with 
my  8-year-old  son,  making  books  at  schools,  libraries, 
museums,  and  even  in  front  of  the  Supreme  Court.  It 
worked. 

We  have  now  spun  off  a  not-for-profit  called  Any- 
where Books  that's  pursuing  this  idea.  World  Rank  has 
funded  a  test  that  I  was  delighted  to  help  launch  in 
jganda  recently.  If  we  could  make  this  technolog\'  work 
in  San  Francisco  and  in  rural  Uganda,  then  we  might 
have  something. 

SF:  What  has  the  demand  been  like? 
BK:  They  love  it.  In  this  rural  area  they  have  created  a 
reading  program  for  the  first  time.  This  is  the  first  time 
some  of  these  kids  have  ever  owned  a  book.  We  would 
like  to  see  this  grow  within  the  library  system  and  the 
Internet  cafe  system. 

SF:  Ignoring  the  vehicle,  what  does  it  cost  to  run? 
BK:  The  capital  cost  is  about  $5,000  or  $6,000  in  the 
United  States  to  buy  the  printer  and  binder  and  stuff.  We 
think  interested  companies  can  get  this  below  $2,000, 
including  the  computer,  with  some  creative  product 
design.  At  that  point  you  could  have  tens  of  thousands  of 
these  very  quickly. 


SF:  This  really  sounds  wonderful.  Now,  what  are  the  flies 
in  the  ointment?  Why  don't  I  see  a  truck  going  up  and 
down  the  street  right  now? 

BK:  The  technology  is  still  quite  early.  But  I'd  say  the 
biggest  barrier  for  achieving  this  goal  is  not  technologi- 
cal. The  technology  problems  are  easy.  Where  we  find  the 
mbling  block  now  is  actually  the  mind-set  change  that 
..lis  is  possible. 


\\\  sa\  we  h<i\e  the  three  characi eristics  required  to  be 
able  to  pull  this  (.)ft.  and  they're  in  mu'  hands  for  the  first 
lime  in  history. 

We  have  the  storage  technology  ro  be  able  to  store  all 
knowledge  again. 

We  have  the  mechanism  of  doing  distribution — uni- 
versal access— using  the  Internet  for  getting  things  close 
to  people.  And  then  we  need  different  mechanisms  for 
the  last  mile. 

The  third  characteristic,  which  is   i-obably  the  least 
appreciated,  is  that  we  have  the  poliiical  will  and  the 
societal  will. 

With  those  three — the  storage,  the  distribution,  and 
the  political  will — we  can  leave  something  for  our  chil- 
dren that  we  can  be  proud  of. 

SF:  Does  that  mean  a  permanent  establishment  of  some 
sort? 

BK:  Yes,  but  I  would  say  we  need  more  than  one  estab- 
lishment. It  \'ou  look  at  the  history  of  libraries,  you  see 
that  the\  tend  to  be  burned.  The  new  guys  don't  want 
the  old  stuff  around.  So  the  lesson  of  the  first  Library  of 
Alexandria  is  "don't  have  just  one  copy." 

The  collections  of  the  Internet  Archive  are  here  in 
San  Francisco,  where  we're  very  conscious  of  being  in  an 
earthquake  zone.  .Also,  we're  in  an  upstart  country  that's 
only  200  years  old.  /Ml  sorts  of  things  can  go  wrong. 

Some  scientists  have  learned  to  have  copies  of  seed 
banks  and  data  sets  on  different  continents  to  aid  preser- 
vation. We,  in  the  library  world,  could  do  the  same  sorts 
of  things. 

SF:  Would  those  hubs  be  complete  mirrors? 
BK:  We  envision  complete  copies  of  everything  else  in 
other  Internet  Archives  around  the  world.  They  may  have 
limited  rights  of  what  they  can  do  with  them.  But  at  least 
it's  preserved. 

Our  first  agreement  in  this  direction  is  with  the 
new  Library  of  Alexandria  in  Alexandria,  Egypt,  where 
we  donated  a  copy  of  our  collection  in  2001.  We  have 
donated  100  terabytes  of  computer  facilities  and  the  data 
to  go  on  it.  Raj  Reddy  from  Carnegie  Mellon  University 
has  donated  book-scanning  facilities,  so  the  library  is 
digitizing  its  Arabic  collection. 

SF:  It's  the  upgraded  version  of  the  Library  of  Congress. 
BK:  1  would  not  go  that  far,  because  the  collections  in  the 
Library  of  Congress  are  fantastic,  but  we  hope  that  some 
of  these  ideas  will  be  widely  adopted. 

Rut  the  key  thing  now  is  access.  The  way  to  start  in 
this  game  is  with  a  petabyte,  a  gigabit  per  second,  and 
$100  million  in  endowment. 
SF:  ,And  how  much  computing? 
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BK:  The  computing  that  comes  along  with  a  petabyte — a 
thousand  or  a  couple  of  thousand  computers. 
SF:  So  you  need  1,000  processors,  aggregate  external  net- 
working of  a  gigabit  a  second,  and  a  petabyte  of  persistent 
storage. 

BK:  Currently  that  is  what  it  takes,  but  it  will  be  beyond 
that  soon.  The  endowment  of  say,  $100  million,  can  pro- 
vide a  funding  stream  to  keep  the  bits  accessible  and  fund 
the  transition  to  new  technologies  as  they  come  along. 
We  now  have  San  Francisco  and  Alexandria  getting 
there.  The  next  one  we  hope  will  be  in  .'\msterdam, 
because  it's  a  really  good  place  for  bandwidth,  technol- 
ogy, and  a  cultural  in-between  for  Europe. 


government  materials  online  in  the  United  States.  We've 
seen  a  lot  of  that  momentum  erode.  I  wouldn't  say  it's 
partisan,  but  there  are  those  who  believe  that  broad 
access  to  information  is  worth  the  risks  and  there  are 
those  who  prefer  control  Over  the  years,  popular  support 
swings  back  and  forth. 

People  are  starting  to  c'Xpect  that  information  is  avail- 
able online.  There  is  a  grow;ng  realization  that  certainly 
students,  if  not  many  pro.t  sionals,  use  the  Internet  as 
their  information  resoun v  of  first  resort — and,  in  fact, 
one  study  suggests,  as  th'  .r  only  resort  in  40  percent  of 
the  cases.  This  estimate  n  jy  even  be  too  low.  So  the  state- 
ment, "if  it's  not  on  the  i'^ternet,  it's  as  if  it  doesn't  exist," 
seems  to  beccjming  true. 


SF:  What  about  the  intellectual  property  law  issues?  What 
about  control  issues?  What  about  the  export  and  import 
of  cultural  property?  There  is  restricted  Internet  access  to 
libraries,  and  there  are  the  traditional  locked  books.  So 
there  are  negatives  in  the  tradition. 
BK:  This  begs  the  question  of  whether  it  is  to  society's 
benefit  to  live  in  this  future.  Do  we  want  to  make  this 
step?  Or  is  the  status  quo  serving  us  well  enough  that, 
hey,  even  though  technology  affords  us  the  possibility  of 
change,  let's  not  bother,  thanks  very  much — we're  just 
happy  the  way  we  are? 

I'm  not  a  technological  inevitability  guy.  I  don't 
believe  that  everything  that  can  be  built  must  be  built. 
People  will  try  to  understand  if  they  will  be  better  off  or 
worse  off  by  following  this  path.  But  if  that  is  the  central 
issue,  we  are  in  good  shape.  Then  we  have  to  work  with 
legislatures,  the  judiciary,  and  our  educational  institu- 
tions to  make  the  adjustments  necessary  to  build  this 
future. 

We  have  some  reason  to  be  optimistic.  When  I  worked 
on  an  Internet  publishing  system  called  WAIS  [Wide  Area 
Information  Server]  in  1989,  we  worked  with  many  print 
publishers. 

The  print  publishers  were  wonderful.  They  started 
with  experiments  that  didn't  cost  them  very  much  to  test 
the  waters  and  then  dove  in.  The  "new  media"  depart- 
ments in  the  early  '90s  were  later  melded  in  to  become  a 
normal  part  of  how  newspapers  and  periodicals  worked. 

You've  got  some  false  starts  in  the  book  world — the 
e-book  stumble,  which  was  regrettable.  But  it's  moving 
along. 

What  the  music  and  movie  guys  are  doing,  1  can't  tell 
you.  I  have  not  worked  with  many  businesspeople  who 
want  to  spend  much  time  lobbying  or  in  court. 

During  the  '90s  there  was  a  great  push  to  put  a  lot  of 


SF:  What  about  being  able  to  distinguish  the  best  from 
the  rest?  My  faculty  friends  comment  on  the  ability 
of  their  students  to  get  stuff  off  the  Internet,  but  they 
are  not  able  to  tell  right  from  wrong,  or  plausible  from 
implausible.  And,  of  course,  every  parent  has  a  set  of  hor- 
ror stories  about  materials  they  wish  their  children  hadn't 
gotten  yet. 

BK:  We  have  a  couple  of  problems.  One  is  that  a  lot  of 
the  great  literature  is  not  available  online.  This  is  a  major 
screw-up  at  a  societal  level.  Students  looking  around  on 
the  search  engine  of  the  moment  won't  find  these  materi- 
als. They  might  have  to  go  to  the  restricted  terminal  in 
a  public  library  during  limited  times  and  use  a  different 
kind  of  search  engine,  and  even  then  things  may  not  be 
available.  This  is  no  way  to  support  a  culture. 

Then  there's  this  other  issue  of  how  do  you  find  the 
good  stuff  and  separate  it  from  the  bad?  I'd  say  that  my 
')-year-old  has  a  better  bull  detector  than  1  did  in  college. 
He  is  inundated  with  propaganda,  and  he  is  finding  his 
way.  Developing  these  skills  seems  to  be  something  kids 
learn  early  these  days. 

I'm  also  quite  impressed  by  the  current  search  engines, 
where  with  just  a  couple  of  words,  they're  able  to  come 
up  with  a  couple  of  answers  in  the  top  10  out  of  billions 
of  potential  documents.  In  a  flash.  The  technology  is 
keeping  up  pretty  well. 

SF:  Do  you  believe  that  those  technologies,  when  applied 
to  your  universal  library  as  opposed  to  the  Web  alone, 
will  be  satisfactory? 

BK:  Yes.  1  think  we'll  get  more  complicated  than  having 
one  search  engine  for  everything.  Things  will  become 
more  interesting  and  more  complicated  as  the  ne.xt 
decades  roll  on.  But  I  don't  think  it's  beyond  our  techno- 
logical abilities  to  be  able  to  pull  this  off. 
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SF:  It's  not  just  books  that  you  will  be  collecting;  you  will 
likely  be  collecting  copies  of  software.  So  your  intent  is 
not  simply  to  publisbi  words,  but... 
BK:  It's  everything.  It  has  been  estimated  that  there 
are  50,000  titles  of  packaged  software.  This  is  a  doable 
number  of  items  to  copy  onto  more  durable  storage  and 
provide  emulated  environments  to  be  able  to  run  them 
again. 

We  have  found  that  preserving  older  packaged  soft- 
ware is  more  difficult  because  they  used  copy  protection 
for  a  time,  but  that  had  largely  disappeared  in  the  late 
1980s.  This  may  serve  as  an  interesting  historical  note, 
given  all  the  current  work  on  copy  protection,  or  DRM 
[digital  rights  management],  for  movies  and  music.  Ihc 
software  industry  could  not  get  DRM  to  work  for  itself; 
why  does  the  movie  and  music  industry  think  the  soft- 
ware industry  can  make  it  work  for  them? 

If  I  buy  Microsoft  Office,  I  can  put  it  on  a  computer, 
I  can  copy  it  onto  another  computer.  There  is  a  license 
key  that  you  can  put  in  a  file.  This  is  how  the  industry 
has  grown  to  protect  its  property;  it  is  not  through  digital 
rights  management.  In  packaged  software,  it's  based  on 
law  rather  than  technological  measures. 


SF:  Can  I  ask  you  to  go  back  for  a  moment  to  technology 
and  how  it  works  both  for  your  bookmobile  and  for  the 
Alexandria  library? 

BK:  We're  a  small  organization  and  the  only  way  we  can 
work  well  is  by  working  with  lots  of  other  organizations. 
We  find  that  we're  a  technology  partner  for  others — espe- 
cially in  the  library  world  where  there  are  many  experts 
in  specific  fields,  but  few  technologists  that  can  build  and 
maintain  petabyte  machines. 

Technologically,  what  we  use  for  storage  are  Linux 
machines  built  on  desktop  Intel,  AMD,  and  Via  proces- 
sors, with  four  hard  drives  each.  Currently,  those  are 
300-gigabyte  hard  drives.  These  are  stacked  up  and  run 
without  modification  from  normal  Linux.  We  mirror 
across  machines  and  do  geographic  mirroring.  We  don't 
use  RAID. 

The  Web  collection  grows  at  about  20  terabytes  a 
month.  Alexa  Internet  is  doing  most  of  the  crawling,  and 
we  also  do  some  with  our  own  open  source  crawlers. 
SF:  What  sort  of  networking  do  you  have,  and  what  sorts 
of  access  rates  do  you  have? 

BK:  We  use  500  megabits  per  second  of  bandwidth  almost 
all  the  time.  This  is  about  5  terabytes  of  downloads  a  day 
of  mostly  rich  media  files.  The  demand  has  been  at  least 
quadrupling  each  year. 


The  VVayback  machine,  which  is  a  sort  of  zero-order 
interface  to  how  to  use  this  material,  allows  you  to  surf 
the  Web  as  it  once  was.  It's  available  on  archive.org,  so 
you  can  type  in  an  URL  and  see  past  versions  and  surf  the 
Web  at  different  time  periods. 

That  gets  about  8  million  hits  a  day,  or  about  100  hits 
per  second.  That's  running  on  this  Linux  cluster  where 
there's  no  Cisco,  no  C")raclc,  no  Sun,  no  special  anything. 
I'verything  is  built  out  oi  ii.icks,  along  the  Jim  Gray  [head 
of  Microsoft's  Bay  Area  Research  Center]  model.  We  do 
get  help  from  people  at  11 M  Almaden,  HP  Labs,  Microsoft 
Labs— all  helping  to  builc  these  petabyte  systems. 
SF:  How  big  is  this  system  i)hysically? 
BK:  The  current  active  area  where  our  machines  are  is 
about  1 ,000  sc]uare  feet. 
SF:  Technologically,  do  you  have  a  wish  list? 
BK:  Yes,  that  we  stay  on  track  with  Moore's  law— that  the 
disk  guys  continue  at  it,  that  the  processor  guys  continue 
at  it.  Moore's  law  says  that  if  we  spend  the  same  amount 
in  five  years  we  will  get  10  times  more  of  whatever  it 
is.  Probably  one  of  the  most  worrisome  problems— and 
it's  actually  not  a  technological  problem — is  the  com- 
munications guys.  The  fiber  engineers  have  been  doing 
great  work.  Those  guys  are  stripping  ahead  of  everybody. 
Moore's  law  is  nothing  to  them.  They're  awesome. 

But  the  pricing  of  Internet  bandwidth  has  not  been 
coming  down  at  a  Moore's-Iaw  pace.  I'm  really  worried 
that  it's  going  to  kill  the  disk-drive  industry  and  the 
processor  industry,  because  unless  we  move  some  bits 
around,  they  are  going  to  wilt.  Right  now  we  need  last- 
mile  infrastructure,  and  we  need  a  mechanism  for  getting 
to  fiber  somewhere  closer  to  cost. 

Again,  those  aren't  technological  problems.  We  just 
need  a  Moore's-law  corporate  mentality  to  spread  to  the 
communications  companies — twice  as  much  for  the  same 
money  every  18  months.  Many  companies  have  done 
well  with  this  approach,  but  if  all  of  our  industries  don't 
stay  in  step,  we  may  falter. 


SF:  To  what  extent  do  you  view  yourself  as  part  of  the 
open  source  philosophical  movement? 
BK:  We  see  ourselves  as  absolutely  part  of  the  open  source 
environment — though  I  should  say  that  the  Internet 
Archive,  which  is  a  tiny  organization,  has  already  spun 
off  four  companies  in  the  last  24  months.  We  don't  see 
ourselves  as  anticommercial  in  any  sense.  But  we  firmly 
believe  open  source  is  the  best  way  to  conduct  our  busi- 
ness. So  almost  all  of  the  software  we  use  is  open  source. 
We  are  fabricating  our  next-generation  petabyte 
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machine  right  now.  Since  we  have  i^aid  tor  tlie  metal  box 
designs,  we  will  make  those  open  source  (Cii^l,,  {/NIJ  Gen- 
eral Public  License)  as  well.  The  idea  is  to  have  everytiiing 
from  the  physical  hardware  to  the  operating  system  open. 


liie  VVeh  has  a  lot  of  iiiateri.ils  tiiat  weren't  designed 
tor  etemit\.  If  people  request  that  ihe\-  be  taken  out  of 
the  \Va\'back  machine,  then  we  do  'hat.  We  tr\  to  under- 
stand where  is  the  right  balance. 


SF:  What  would  you  be  doing  if  you  weren't  doing  the 

library? 

BK:  Right  now  we're  really  settled:  what  we're  trying 

to  achieve  is  the  preservation  and  access  to  all  human 

knowledge. 

One  thing  that  I  tried  to  do  way  back  when  was  to 
help  protect  people's  privacy.  People  in  general  will  throw 
away  their  privacy  without  understanding  the  longer- 
term  implications. 

SF:  The  privacy  issue  is  an  interesting  one  to  bring  in.  It 
can  be  very  embarrassing  to  go  find  out  what  you  actually 
said  in  1985. 

BK:  Absolutely.  We  try  to  stay  directly  in  touch  with  the 
thinkers  in  this  area,  so  that  we  can  make  a  library  that 
has  the  right  balances  to  it.  I'm  on  the  board  of  FTP  |Plec- 
tronic  Frontier  Foundation].  It's  the  ACI.U  of  the  digital 
world. 


SF:  We  haven't  discussed  the  implications  of  the  possibly 
enormous  data  growth  that  comes  w.rh  video.  When 
video  tatthes  on,  suddenly  more  zeri.  may  show  up  on 
some  of  your  technology  needs.  The  privacy  implications 
of  capturing  10  million  surveillance  .ameras  are  too  awful 
to  think  about. 

BK:  When  everybody  has  a  camera  pointed  at  their  kid's 
crib— do  we  really  need  to  have  all  of  that  in  the  library? 
1  would  say  we  will  become  more  selective.  Right  now  we 
don't  ha\e  to  lie  selective  because  the  technology  makes 
it  easv  enough.  And  we're  not  smart  enough  to  know 
e.xacti)'  what  it  is  historians  want.  O 
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By  KEVIN  KELLY 

Correction  Appended 

In  several  dozen  nondescript  office  buildings  around  the  world,  thousands  of  hourly  workers 
bend  over  table-top  scanners  and  haul  dusty  books  into  high-tech  scanning  booths.  They  are 
assembling  the  universal  library  page  by  page. 

The  dream  is  an  old  one:  to  have  in  one  place  all  knowledge,  past  and  present.  All  books,  all 
documents,  all  conceptual  works,  in  all  languages.  It  is  a  familiar  hope,  in  part  because  long 
ago  we  briefly  built  such  a  library.  The  great  library  at  Alexandria,  constructed  around  300 
B.C.,  was  designed  to  hold  all  the  scrolls  circulating  in  the  known  world.  At  one  time  or 
another,  the  library  held  about  half  a  million  scrolls,  estimated  to  have  been  between  30  and 
70  percent  of  all  books  in  existence  then.  But  even  before  this  great  library  was  lost,  the 
moment  when  all  knowledge  could  be  housed  in  a  single  building  had  passed.  Since  then,  the 
constant  expansion  of  information  has  overwhelmed  our  capacity  to  contain  it.  For  2,000 
X       years,  the  universal  library,  together  with  other  perennial  longings  like  invisibility  cloaks, 
antigravity  shoes  and  paperless  offices,  has  been  a  mythical  dream  that  kept  receding  further 
into  the  infinite  future. 

Until  now.  When  Google  announced  in  December  2004  that  it  would  digitally  scan  the  books 
of  five  major  research  libraries  to  make  their  contents  searchable,  the  promise  of  a  universal 
library  was  resurrected.  Indeed,  the  explosive  rise  of  the  Web,  going  from  nothing  to 
everything  in  one  decade,  has  encouraged  us  to  believe  in  the  impossible  again.  Might  the 
long-heralded  great  library  of  all  knowledge  really  be  within  our  grasp? 

Brewster  Kahle,  an  archivist  overseeing  another  scanning  project,  says  that  the  universal 
library  is  now  within  reach.  "This  is  our  chance  to  one-up  the  Greeks!"  he  shouts.  "It  is  really 
possible  with  the  technology  of  today,  not  tomorrow.  We  can  provide  all  the  works  of 
humankind  to  all  the  people  of  the  world.  It  will  be  an  achievement  remembered  for  all  time, 
like  putting  a  man  on  the  moon."  And  unlike  the  libraries  of  old,  which  were  restricted  to  the 
elite,  this  library  would  be  truly  democratic,  offering  every  book  to  every  person. 

But  the  technology  that  will  bring  us  a  planetary  source  of  all  written  material  will  also,  in  the 
same  gesture,  transform  the  nature  of  what  we  now  call  the  book  and  the  libraries  that  hold 
them.  The  universal  library  and  its  "books"  will  be  unlike  any  library  or  books  we  have 
known.  Pushing  us  rapidly  toward  that  Eden  of  everything,  and  away  from  the  paradigm  of 
the  physical  paper  tome,  is  the  hot  technology  of  the  search  engine. 
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1.  Scanning  the  Library  of  Libraries 

Scanning  technology  has  been  around  for  decades,  but  digitized  books  didn't  make  much 
sense  until  recently,  when  search  engines  like  Google,  Yahoo,  Ask  and  MSN  came  along. 
When  millions  of  books  have  been  scanned  and  their  texts  are  made  available  in  a  single 
database,  search  technology  will  enable  us  to  grab  and  read  any  book  ever  written.  Ideally,  in 
such  a  complete  library  we  should  also  be  able  to  read  any  article  ever  written  in  any 
newspaper,  magazine  or  journal.  And  why  stop  there?  The  universal  library  should  include  a 
copy  of  every  painting,  photograph,  film  and  piece  of  music  produced  by  all  artists,  present 
and  past.  Still  more,  it  should  include  all  radio  and  television  broadcasts.  Commercials  too. 
And  how  can  we  forget  the  Web?  The  grand  library  naturally  needs  a  copy  of  the  billions  of 
dead  Web  pages  no  longer  online  and  the  tens  of  millions  of  blog  posts  now  gone  —  the 
ephemeral  literature  of  our  time.  In  short,  the  entire  works  of  humankind,  from  the 
beginning  of  recorded  history,  in  all  languages,  available  to  all  people,  all  the  time. 

This  is  a  very  big  library.  But  because  of  digital  technology,  you'll  be  able  to  reach  inside  it 
from  almost  any  device  that  sports  a  screen.  From  the  days  of  Sumerian  clay  tablets  till  now, 
humans  have  "published"  at  least  32  million  books,  750  million  articles  and  essays,  25 
million  songs,  500  million  images,  500,000  movies,  3  million  videos,  TV  shows  and  short 
films  and  100  billion  public  Web  pages.  All  this  material  is  currently  contained  in  all  the 
libraries  and  archives  of  the  world.  When  fully  digitized,  the  whole  lot  could  be  compressed 
(at  current  technological  rates)  onto  50  petabyte  hard  disks.  Today  you  need  a  building  about 
the  size  of  a  small-town  library  to  house  50  petabytes.  With  tomorrow's  technology,  it  will  all 
fit  onto  your  iPod.  When  that  happens,  the  library  of  all  libraries  will  ride  in  your  purse  or 
wallet  —  if  it  doesn't  plug  directly  into  your  brain  with  thin  white  cords.  Some  people  alive 
today  are  surely  hoping  that  they  die  before  such  things  happen,  and  others,  mostly  the 
young,  want  to  know  what's  taking  so  long.  (Could  we  get  it  up  and  running  by  next  week? 
They  have  a  history  project  due.) 

Technology  accelerates  the  migration  of  all  we  know  into  the  universal  form  of  digital  bits. 
Nikon  will  soon  quit  making  film  cameras  for  consumers,  and  Minolta  already  has:  better 
think  digital  photos  from  now  on.  Nearly  100  percent  of  all  contemporary  recorded  music  has 
already  been  digitized,  much  of  it  by  fans.  About  one-tenth  of  the  500,000  or  so  movies  listed 
on  the  Internet  Movie  Database  are  now  digitized  on  DVD.  But  because  of  copyright  issues 
and  the  physical  fact  of  the  need  to  turn  pages,  the  digitization  of  books  has  proceeded  at  a 
relative  crawl.  At  most,  one  book  in  20  has  moved  from  analog  to  digital.  So  far,  the  universal 
library  is  a  library  without  many  books. 

But  that  is  changing  very  fast.  Corporations  and  libraries  around  the  world  are  now  scanning 
about  a  million  books  per  year.  Amazon  has  digitized  several  hundred  thousand 
contemporary  books.  In  the  heart  of  Silicon  Valley,  Stanford  University  (one  of  the  five 
libraries  collaborating  with  Google)  is  scanning  its  eight-million-book  collection  using  a 
state-of-the  art  robot  from  the  Swiss  company  4DigitalBooks.  This  machine,  the  size  of  a 
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small  S.U.V.,  automatically  turns  the  pages  of  each  book  as  it  scans  it,  at  the  rate  of  1,000 
pages  per  hour.  A  human  operator  places  a  book  in  a  flat  carriage,  and  then  pneumatic  robot 
fingers  flip  the  pages  —  delicately  enough  to  handle  rare  volumes  —  under  the  scanning  eyes 
of  digital  cameras. 

Like  many  other  functions  in  our  global  economy,  however,  the  real  work  has  been 
happening  far  away,  while  we  sleep.  We  are  outsourcing  the  scanning  of  the  universal 
library.  Superstar,  an  entrepreneurial  company  based  in  Beijing,  has  scanned  every  book 
from  900  university  libraries  in  China.  It  has  already  digitized  1.3  million  unique  titles  in 
Chinese,  which  it  estimates  is  about  half  of  all  the  books  published  in  the  Chinese  language 
since  1949.  It  costs  $30  to  scan  a  book  at  Stanford  but  only  $10  in  China. 

Raj  Reddy,  a  professor  at  Carnegie  Mellon  University,  decided  to  move  a  fair-size 
English-language  library  to  where  the  cheap  subsidized  scanners  were.  In  2004,  he  borrowed 
30,000  volumes  from  the  storage  rooms  of  the  Carnegie  Mellon  library  and  the  Carnegie 
Library  and  packed  them  off  to  China  in  a  single  shipping  container  to  be  scanned  by  an 
assembly  line  of  workers  paid  by  the  Chinese.  His  project,  which  he  calls  the  Million  Book 
Project,  is  churning  out  100,000  pages  per  day  at  20  scanning  stations  in  India  and  China. 
Reddy  hopes  to  reach  a  million  digitized  books  in  two  years. 

The  idea  is  to  seed  the  bookless  developing  world  with  easily  available  texts.  Superstar  sells 
copies  of  books  it  scans  back  to  the  same  university  libraries  it  scans  from.  A  university  can 
expand  a  typical  60,000-volume  library  into  a  1.3  million-volume  one  overnight.  At  about  50 
)       cents  per  digital  book  acquired,  it's  a  cheap  way  for  a  library  to  increase  its  collection.  Bill 
McCoy,  the  general  manager  of  Adobe's  e-publishing  business,  says:  "Some  of  us  have 
thousands  of  books  at  home,  can  walk  to  wonderful  big-box  bookstores  and  well-stocked 
libraries  and  can  get  Amazon.com  to  deliver  next  day.  The  most  dramatic  effect  of  digital 
libraries  will  be  not  on  us,  the  well -booked,  but  on  the  billions  of  people  worldwide  who  are 
underserved  by  ordinary  paper  books."  It  is  these  underbooked  —  students  in  Mali,  scientists 
in  Kazakhstan,  elderly  people  in  Peru  —  whose  lives  will  be  transformed  when  even  the 
simplest  unadorned  version  of  the  universal  library  is  placed  in  their  hands. 

2.  What  Happens  When  Books  Connect 

The  least  important,  but  most  discussed,  aspects  of  digital  reading  have  been  these 
contentious  questions:  Will  we  give  up  the  highly  evolved  technology  of  ink  on  paper  and 
instead  read  on  cumbersome  machines?  Or  will  we  keep  reading  our  paperbacks  on  the 
beach?  For  now,  the  answer  is  yes  to  both.  Yes,  publishers  have  lost  millions  of  dollars  on  the 
long-prophesied  e-book  revolution  that  never  occurred,  while  the  number  of  physical  books 
sold  in  the  world  each  year  continues  to  grow.  At  the  same  time,  there  are  already  more  than 
a  half  a  billion  PDF  documents  on  the  Web  that  people  happily  read  on  computers  without 
printing  them  out,  and  still  more  people  now  spend  hours  watching  movies  on  microscopic 
cellphone  screens.  The  arsenal  of  our  current  display  technology  —  from  handheld  gizmos  to 
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large  flat  screens  —  is  already  good  enough  to  move  books  to  their  next  stage  of  evolution:  a 
full  digital  scan. 

Yet  the  common  vision  of  the  library's  future  (even  the  e-book  future)  assumes  that  books 
will  remain  isolated  items,  independent  from  one  another,  just  as  they  are  on  shelves  in  your 
public  library.  There,  each  book  is  pretty  much  unaware  of  the  ones  next  to  it.  When  an 
author  completes  a  work,  it  is  fixed  and  finished.  Its  only  movement  comes  when  a  reader 
picks  it  up  to  animate  it  with  his  or  her  imagination.  In  this  vision,  the  main  advantage  of 
the  coming  digital  library  is  portability  —  the  nifty  translation  of  a  book's  full  text  into  bits, 
which  permits  it  to  be  read  on  a  screen  anywhere.  But  this  vision  misses  the  chief  revolution 
birthed  by  scanning  books:  in  the  universal  library,  no  book  will  be  an  island. 

Turning  inked  letters  into  electronic  dots  that  can  be  read  on  a  screen  is  simply  the  first 
essential  step  in  creating  this  new  library.  The  real  magic  will  come  in  the  second  act,  as  each 
word  in  each  book  is  cross-linked,  clustered,  cited,  extracted,  indexed,  analyzed,  annotated, 
remixed,  reassembled  and  woven  deeper  into  the  culture  than  ever  before.  In  the  new  world 
of  books,  every  bit  informs  another;  every  page  reads  all  the  other  pages. 

In  recent  years,  hundreds  of  thousands  of  enthusiastic  amateurs  have  written  and 
cross-referenced  an  entire  online  encyclopedia  called  Wikipedia.  Buoyed  by  this  success, 
many  nerds  believe  that  a  billion  readers  can  reliably  weave  together  the  pages  of  old  books, 
one  hyperlink  at  a  time.  Those  with  a  passion  for  a  special  subject,  obscure  author  or  favorite 
book  will,  over  time,  link  up  its  important  parts.  Multiply  that  simple  generous  act  by 
millions  of  readers,  and  the  universal  library  can  be  integrated  in  full,  by  fans  for  fans. 

In  addition  to  a  link,  which  explicitly  connects  one  word  or  sentence  or  book  to  another, 
readers  will  also  be  able  to  add  tags,  a  recent  innovation  on  the  Web  but  already  a  popular 
one.  A  tag  is  a  public  annotation,  like  a  keyword  or  category  name,  that  is  hung  on  a  file,  page, 
picture  or  song,  enabling  anyone  to  search  for  that  file.  For  instance,  on  the  photo-sharing  site 
Flickr,  hundreds  of  viewers  will  "tag"  a  photo  submitted  by  another  user  with  their  own 
simple  classifications  of  what  they  think  the  picture  is  about:  "goat,"  "Paris,"  "goofy,"  "beach 
party."  Because  tags  are  user-generated,  when  they  move  to  the  realm  of  books,  they  will  be 
assigned  faster,  range  wider  and  serve  better  than  out-of-date  schemes  like  the  Dewey 
Decimal  System,  particularly  in  frontier  or  fringe  areas  like  nanotechnology  or  body 
modification. 

The  link  and  the  tag  may  be  two  of  the  most  important  inventions  of  the  last  50  years.  They 
get  their  initial  wave  of  power  when  we  first  code  them  into  bits  of  text,  but  their  real 
transformative  energies  fire  up  as  ordinary  users  click  on  them  in  the  course  of  everyday 
Web  surfing,  unaware  that  each  humdrum  click  "votes"  on  a  link,  elevating  its  rank  of 
relevance.  You  may  think  you  are  just  browsing,  casually  inspecting  this  paragraph  or  that 
page,  but  in  fact  you  are  anonymously  marking  up  the  Web  with  bread  crumbs  of  attention. 
These  bits  of  interest  are  gathered  and  analyzed  by  search  engines  in  order  to  strengthen  the 
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relationship  between  the  end  points  of  every  link  and  the  connections  suggested  by  each  tag. 
This  is  a  type  of  intelligence  common  on  the  Web,  but  previously  foreign  to  the  world  of 
books. 

Once  a  book  has  been  integrated  into  the  new  expanded  library  by  means  of  this  linking,  its 
text  will  no  longer  be  separate  from  the  text  in  other  books.  For  instance,  today  a  serious 
nonfiction  book  will  usually  have  a  bibliography  and  some  kind  of  footnotes.  When  books  are 
deeply  linked,  you'll  be  able  to  click  on  the  title  in  any  bibliography  or  any  footnote  and  find 
the  actual  book  referred  to  in  the  footnote.  The  books  referenced  in  that  book's  bibliography 
will  themselves  be  available,  and  so  you  can  hop  through  the  library  in  the  same  way  we  hop 
through  Web  links,  traveling  from  footnote  to  footnote  to  footnote  until  you  reach  the  bottom 
of  things. 

Next  come  the  words.  Just  as  a  Web  article  on,  say,  aquariums,  can  have  some  of  its  words 
linked  to  definitions  of  fish  terms,  any  and  all  words  in  a  digitized  book  can  be  hyperlinked  to 
other  parts  of  other  books.  Books,  including  fiction,  will  become  a  web  of  names  and  a 
community  of  ideas. 

Search  engines  are  transforming  our  culture  because  they  harness  the  power  of  relationships, 
which  is  all  links  really  are.  There  are  about  lOO  billion  Web  pages,  and  each  page  holds,  on 
average,  lo  links.  That's  a  trillion  electrified  connections  coursing  through  the  Web.  This 
tangle  of  relationships  is  precisely  what  gives  the  Web  its  immense  force.  The  static  world  of 
book  knowledge  is  about  to  be  transformed  by  the  same  elevation  of  relationships,  as  each 
page  in  a  book  discovers  other  pages  and  other  books.  Once  text  is  digital,  books  seep  out  of 
their  bindings  and  weave  themselves  together.  The  collective  intelligence  of  a  library  allows 
us  to  see  things  we  can't  see  in  a  single,  isolated  book. 

When  books  are  digitized,  reading  becomes  a  community  activity.  Bookmarks  can  be  shared 
with  fellow  readers.  Marginalia  can  be  broadcast.  Bibliographies  swapped.  You  might  get  an 
alert  that  your  friend  Carl  has  annotated  a  favorite  book  of  yours.  A  moment  later,  his  links 
are  yours.  In  a  curious  way,  the  universal  library  becomes  one  very,  very,  very  large  single 
text:  the  world's  only  book. 

3.  Books:  The  Liquid  Version 

At  the  same  time,  once  digitized,  books  can  be  unraveled  into  single  pages  or  be  reduced 
further,  into  snippets  of  a  page.  These  snippets  will  be  remixed  into  reordered  books  and 
virtual  bookshelves.  Just  as  the  music  audience  now  juggles  and  reorders  songs  into  new 
albums  (or  "playlists,"  as  they  are  called  in  iTunes),  the  universal  library  will  encourage  the 
creation  of  virtual  "bookshelves"  —  a  collection  of  texts,  some  as  short  as  a  paragraph,  others 
as  long  as  entire  books,  that  form  a  library  shelf  s  worth  of  specialized  information.  And  as 
with  music  playlists,  once  created,  these  "bookshelves"  will  be  published  and  swapped  in  the 
public  commons.  Indeed,  some  authors  will  begin  to  write  books  to  be  read  as  snippets  or  to 
be  remixed  as  pages.  The  ability  to  purchase,  read  and  manipulate  individual  pages  or 
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sections  is  surely  what  will  drive  reference  books  (cookbooks,  how-to  manuals,  travel  guides) 
in  the  future.  You  might  concoct  your  own  "cookbook  shelf  of  Cajun  recipes  compiled  from 
many  different  sources;  it  would  include  Web  pages,  magazine  clippings  and  entire  Cajun 
cookbooks.  Amazon  currently  offers  you  a  chance  to  publish  your  own  bookshelves  (Amazon 
calls  them  "listmanias")  as  annotated  lists  of  books  you  want  to  recommend  on  a  particular 
esoteric  subject.  And  readers  are  already  using  Google  Book  Search  to  round  up  minilibraries 
on  a  certain  topic  —  all  books  about  Sweden,  for  instance,  or  books  on  clocks.  Once  snippets, 
articles  and  pages  of  books  become  ubiquitous,  shuffle-able  and  transferable,  users  will  earn 
prestige  and  perhaps  income  for  curating  an  excellent  collection. 

Libraries  (as  well  as  many  individuals)  aren't  eager  to  relinquish  ink-on-paper  editions, 
because  the  printed  book  is  by  far  the  most  durable  and  reliable  backup  technology  we  have. 
Printed  books  require  no  mediating  device  to  read  and  thus  are  immune  to  technological 
obsolescence.  Paper  is  also  extremely  stable,  compared  with,  say,  hard  drives  or  even  CD's.  In 
this  way,  the  stability  and  fixity  of  a  bound  book  is  a  blessing.  It  sits  there  unchanging,  true 
to  its  original  creation.  But  it  sits  alone. 

So  what  happens  when  all  the  books  in  the  world  become  a  single  liquid  fabric  of 
interconnected  words  and  ideas?  Four  things:  First,  works  on  the  margins  of  popularity  will 
find  a  small  audience  larger  than  the  near-zero  audience  they  usually  have  now.  Far  out  in 
the  "long  tail"  of  the  distribution  curve  —  that  extended  place  of  low-to-no  sales  where  most 
of  the  books  in  the  world  live  —  digital  interlinking  will  lift  the  readership  of  almost  any  title, 
no  matter  how  esoteric.  Second,  the  universal  library  will  deepen  our  grasp  of  history,  as 
every  original  document  in  the  course  of  civilization  is  scanned  and  cross-linked.  Third,  the 
universal  library  of  all  books  will  cultivate  a  new  sense  of  authority.  If  you  can  truly 
incorporate  all  texts  —  past  and  present,  multilingual  —  on  a  particular  subject,  then  you  can 
have  a  clearer  sense  of  what  we  as  a  civilization,  a  species,  do  know  and  don't  know.  The 
white  spaces  of  our  collective  ignorance  are  highlighted,  while  the  golden  peaks  of  our 
knowledge  are  drawn  with  completeness.  This  degree  of  authority  is  only  rarely  achieved  in 
scholarship  today,  but  it  will  become  routine. 

Finally,  the  full,  complete  universal  library  of  all  works  becomes  more  than  just  a  better  Ask 
Jeeves.  Search  on  the  Web  becomes  a  new  infrastructure  for  entirely  new  functions  and 
services.  Right  now,  if  you  mash  up  Google  Maps  and  Monster.com,  you  get  maps  of  where 
jobs  are  located  by  salary.  In  the  same  way,  it  is  easy  to  see  that  in  the  great  library, 
everything  that  has  ever  been  written  about,  for  example,  Trafalgar  Square  in  London  could 
be  present  on  that  spot  via  a  screen.  In  the  same  way,  every  object,  event  or  location  on  earth 
would  "know"  everything  that  has  ever  been  written  about  it  in  any  book,  in  any  language,  at 
anytime.  From  this  deep  structuring  of  knowledge  comes  a  new  culture  of  interaction  and 
participation. 

The  main  drawback  of  this  vision  is  a  big  one.  So  far,  the  universal  library  lacks  books. 
Despite  the  best  efforts  of  bloggers  and  the  creators  of  the  Wikipedia,  most  of  the  world's 
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expertise  still  resides  in  books.  And  a  universal  library  without  the  contents  of  books  is  no 
universal  library  at  all. 

There  are  dozens  of  excellent  reasons  that  books  should  quickly  be  made  part  of  the  emerging 
Web.  But  so  far  they  have  not  been,  at  least  not  in  great  numbers.  And  there  is  only  one 
reason:  the  hegemony  of  the  copy. 

4.  The  Triumph  of  the  Copy 

The  desire  of  all  creators  is  for  their  works  to  find  their  way  into  all  minds.  A  text,  a  melody,  a 
picture  or  a  story  succeeds  best  if  it  is  connected  to  as  many  ideas  and  other  works  as 
possible.  Ideally,  over  time  a  work  becomes  so  entangled  in  a  culture  that  it  appears  to  be 
inseparable  from  it,  in  the  way  that  the  Bible,  Shakespeare's  plays,  "Cinderella"  and  the 
Mona  Lisa  are  inseparable  from  ours.  This  tendency  for  creative  ideas  to  infiltrate  other 
works  is  great  news  for  culture.  In  fact,  this  commingling  of  creations  is  culture. 

In  preindustrial  times,  exact  copies  of  a  work  were  rare  for  a  simple  reason:  it  was  much 
easier  to  make  your  own  version  of  a  creation  than  to  duplicate  someone  else's  exactly.  The 
amount  of  energy  and  attention  needed  to  copy  a  scroll  exactly,  word  for  word,  or  to  replicate 
a  painting  stroke  by  stroke  exceeded  the  cost  of  paraphrasing  it  in  your  own  style.  So  most 
works  were  altered,  and  often  improved,  by  the  borrower  before  they  were  passed  on.  Fairy 
tales  evolved  mythic  depth  as  many  different  authors  worked  on  them  and  as  they  migrated 
from  spoken  tales  to  other  media  (theater,  music,  painting).  This  system  worked  well  for 
audiences  and  performers,  but  the  only  way  for  most  creators  to  earn  a  living  from  their 
works  was  through  the  support  of  patrons. 

That  ancient  economics  of  creation  was  overturned  at  the  dawn  of  the  industrial  age  by  the 
technologies  of  mass  production.  Suddenly,  the  cost  of  duplication  was  lower  than  the  cost  of 
appropriation.  With  the  advent  of  the  printing  press,  it  was  now  cheaper  to  print  thousands 
of  exact  copies  of  a  manuscript  than  to  alter  one  by  hand.  Copy  makers  could  profit  more 
than  creators.  This  imbalance  led  to  the  technology  of  copyright,  which  established  a  new 
order.  Copyright  bestowed  upon  the  creator  of  a  work  a  temporary  monopoly  —  for  14  years, 
in  the  United  States  —  over  any  copies  of  the  work.  The  idea  was  to  encourage  authors  and 
artists  to  create  yet  more  works  that  could  be  cheaply  copied  and  thus  fill  the  culture  with 
public  works. 

Not  coincidentally,  public  libraries  first  began  to  flourish  with  the  advent  of  cheap  copies. 
Before  the  industrial  age,  libraries  were  primarily  the  property  of  the  wealthy  elite.  With 
mass  production,  every  small  town  could  afford  to  put  duplicates  of  the  greatest  works  of 
humanity  on  wooden  shelves  in  the  village  square.  Mass  access  to  public-library  books 
inspired  scholarship,  reviewing  and  education,  activities  exempted  in  part  from  the 
monopoly  of  copyright  in  the  United  States  because  they  moved  creative  works  toward  the 
public  commons  sooner,  weaving  them  into  the  fabric  of  common  culture  while  still 
remaining  under  the  author's  copyright.  These  are  now  known  as  "fair  uses." 
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This  wonderful  balance  was  undone  by  good  intentions.  The  first  was  a  new  copyright  law 
passed  by  Congress  in  1976.  According  to  the  new  law,  creators  no  longer  had  to  register  or 
/^         renew  copyright;  the  simple  act  of  creating  something  bestowed  it  with  instant  and 

automatic  rights.  By  default,  each  new  work  was  born  under  private  ownership  rather  than 
in  the  public  commons.  At  first,  this  reversal  seemed  to  serve  the  culture  of  creation  well.  All 
works  that  could  be  copied  gained  instant  and  deep  ownership,  and  artists  and  authors  were 
happy.  But  the  1976  law,  and  various  revisions  and  extensions  that  followed  it,  made  it 
extremely  difficult  to  move  a  work  into  the  public  commons,  where  human  creations 
naturally  belong  and  were  originally  intended  to  reside.  As  more  intellectual  property 
became  owned  by  corporations  rather  than  by  individuals,  those  corporations  successfully 
lobbied  Congress  to  keep  extending  the  once-brief  protection  enabled  by  copyright  in  order 
to  prevent  works  from  returning  to  the  public  domain.  With  constant  nudging,  Congress 
moved  the  expiration  date  from  14  years  to  28  to  42  and  then  to  56. 

While  corporations  and  legislators  were  moving  the  goal  posts  back,  technology  was 
accelerating  forward.  In  Internet  time,  even  14  years  is  a  long  time  for  a  monopoly;  a 
monopoly  that  lasts  a  human  lifetime  is  essentially  an  eternity.  So  when  Congress  voted  in 
1998  to  extend  copyright  an  additional  70  years  beyond  the  life  span  of  a  creator  —  to  a  point 
where  it  could  not  possibly  serve  its  original  purpose  as  an  incentive  to  keep  that  creator 
working  —  it  was  obvious  to  all  that  copyright  now  existed  primarily  to  protect  a  threatened 
business  model.  And  because  Congress  at  the  same  time  tacked  a  20-year  extension  onto  all 
existing  copyrights,  nothing  —  no  published  creative  works  of  any  type  —  will  fall  out  of 
protection  and  return  to  the  public  domain  until  2019.  Almost  everything  created  today  will 
not  return  to  the  commons  until  the  next  century.  Thus  the  stream  of  shared  material  that 
anyone  can  improve  (think  "A  Thousand  and  One  Nights"  or  "Amazing  Grace"  or  "Beauty 
and  the  Beast")  will  largely  dry  up. 

In  the  world  of  books,  the  indefinite  extension  of  copyright  has  had  a  perverse  effect.  It  has 
created  a  vast  collection  of  works  that  have  been  abandoned  by  publishers,  a  continent  of 
books  left  permanently  in  the  dark.  In  most  cases,  the  original  publisher  simply  doesn't  find 
it  profitable  to  keep  these  books  in  print.  In  other  cases,  the  publishing  company  doesn't 
know  whether  it  even  owns  the  work,  since  author  contracts  in  the  past  were  not  as  explicit  as 
they  are  now.  The  size  of  this  abandoned  library  is  shocking:  about  75  percent  of  all  books  in 
the  world's  libraries  are  orphaned.  Only  about  15  percent  of  all  books  are  in  the  public 
domain.  A  luckier  10  percent  are  still  in  print.  The  rest,  the  bulk  of  our  universal  library,  is 
dark. 

5.  The  Moral  Imperative  to  Scan 

The  15  percent  of  the  world's  32  million  cataloged  books  that  are  in  the  public  domain  are 
freely  available  for  anyone  to  borrow,  imitate,  publish  or  copy  wholesale.  Almost  the  entire 
current  scanning  effort  by  American  libraries  is  aimed  at  this  15  percent.  The  Million  Book 
Project  mines  this  small  sliver  of  the  pie,  as  does  Google.  Because  they  are  in  the  commons, 
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no  law  hinders  this  15  percent  from  being  scanned  and  added  to  the  universal  library. 

The  approximately  10  percent  of  all  books  actively  in  print  will  also  be  scanned  before  long. 
Amazon  carries  at  least  four  million  books,  which  includes  multiple  editions  of  the  same  title. 
Amazon  is  slowly  scanning  all  of  them.  Recently,  several  big  American  publishers  have 
declared  themselves  eager  to  move  their  entire  backlist  of  books  into  the  digital  sphere.  Many 
of  them  are  working  with  Google  in  a  partnership  program  in  which  Google  scans  their 
books,  offers  sample  pages  (controlled  by  the  publisher)  to  readers  and  points  readers  to 
where  they  can  buy  the  actual  book.  No  one  doubts  electronic  books  will  make  money 
eventually.  Simple  commercial  incentives  guarantee  that  all  in-print  and  backlisted  books 
will  before  long  be  scanned  into  the  great  library.  That's  not  the  problem. 

The  major  problem  for  large  publishers  is  that  they  are  not  certain  what  they  actually  own.  If 
you  would  like  to  amuse  yourself,  pick  an  out-of-print  book  from  the  library  and  try  to 
determine  who  owns  its  copyright.  It's  not  easy.  There  is  no  list  of  copyrighted  works.  The 
Library  of  Congress  does  not  have  a  catalog.  The  publishers  don't  have  an  exhaustive  list,  not 
even  of  their  own  imprints  (though  they  say  they  are  working  on  it).  The  older,  the  more 
obscure  the  work,  the  less  likely  a  publisher  will  be  able  to  tell  you  (that  is,  if  the  publisher 
still  exists)  whether  the  copyright  has  reverted  to  the  author,  whether  the  author  is  alive  or 
dead,  whether  the  copyright  has  been  sold  to  another  company,  whether  the  publisher  still 
owns  the  copyright  or  whether  it  plans  to  resurrect  or  scan  it.  Plan  on  having  a  lot  of  spare 
time  and  patience  if  you  inquire.  I  recently  spent  two  years  trying  to  track  down  the 
copyright  to  a  book  that  led  me  to  Random  House.  Does  the  company  own  it?  Can  I 
reproduce  it?  Three  years  later,  the  company  is  still  working  on  its  answer.  The  prospect  of 
tracking  down  the  copyright  —  with  any  certainty  —  of  the  roughly  25  million  orphaned  books 
is  simply  ludicrous. 

Which  leaves  75  percent  of  the  known  texts  of  humans  in  the  dark.  The  legal  limbo 
surrounding  their  status  as  copies  prevents  them  from  being  digitized.  No  one  argues  that 
these  are  all  masterpieces,  but  there  is  history  and  context  enough  in  their  pages  to  not  let 
them  disappear.  And  if  they  are  not  scanned,  they  in  effect  will  disappear.  But  with  copyright 
hyperextended  beyond  reason  (the  Supreme  Court  in  2003  declared  the  law  dumb  but  not 
unconstitutional),  none  of  this  dark  library  will  return  to  the  public  domain  (and  be  cleared 
for  scanning)  until  at  least  2019.  With  no  commercial  incentive  to  entice  uncertain 
publishers  to  pay  for  scanning  these  orphan  works,  they  will  vanish  from  view.  According  to 
Peter  Brantley,  director  of  technology  for  the  California  Digital  Library,  "We  have  a  moral 
imperative  to  reach  out  to  our  library  shelves,  grab  the  material  that  is  orphaned  and  set  it 
on  top  of  scanners." 

No  one  was  able  to  unravel  the  Gordian  knot  of  copydom  until  2004,  when  Google  came  up 
with  a  clever  solution.  In  addition  to  scanning  the  15  percent  out-of-copyright  public-domain 
books  with  their  library  partners  and  the  10  percent  in-print  books  with  their  publishing 
partners,  Google  executives  declared  that  they  would  also  scan  the  75  percent  out-of-print 
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books  that  no  one  else  would  touch.  They  would  scan  the  entire  book,  without  resolving  its 
legal  status,  which  would  allow  the  full  text  to  be  indexed  on  Google's  internal  computers  and 
searched  by  anyone.  But  the  company  would  show  to  readers  only  a  few  selected 
sentence-long  snippets  from  the  book  at  a  time.  Google's  lawyers  argued  that  the  snippets  the 
company  was  proposing  were  something  like  a  quote  or  an  excerpt  in  a  review  and  thus 
should  qualify  as  a  "fair  use." 

Google's  plan  was  to  scan  the  full  text  of  every  book  in  five  major  libraries:  the  more  than  lo 
million  titles  held  by  Stanford,  Harvard.  Oxford,  the  Universitv  of  Michiean  and  the  New 
York  Public  Library.  Every  book  would  be  indexed,  but  each  would  show  up  in  search  results 
in  different  ways.  For  out-of-copyright  books,  Google  would  show  the  whole  book,  page  by 
page.  For  the  in-print  books,  Google  would  work  with  publishers  and  let  them  decide  what 
parts  of  their  books  would  be  shown  and  under  what  conditions.  For  the  dark  orphans, 
Google  would  show  only  limited  snippets.  And  any  copyright  holder  (author  or  corporation) 
who  could  establish  ownership  of  a  supposed  orphan  could  ask  Google  to  remove  the  snippets 
for  any  reason. 

At  first  glance,  it  seemed  genius.  By  scanning  all  books  (something  only  Google  had  the  cash 
to  do),  the  company  would  advance  its  mission  to  organize  all  knowledge.  It  would  let  books 
be  searchable,  and  it  could  potentially  sell  ads  on  those  searches,  although  it  does  not  do  that 
currently.  In  the  same  stroke,  Google  would  rescue  the  lost  and  forgotten  75  percent  of  the 
library.  For  many  authors,  this  all-out  campaign  was  a  salvation.  Google  became  a  discovery 
tool,  if  not  a  marketing  program.  While  a  few  best-selling  authors  fear  piracy,  every  author 
fears  obscurity.  Enabling  their  works  to  be  found  in  the  same  universal  search  box  as 
everything  else  in  the  world  was  good  news  for  authors  and  good  news  for  an  industry  that 
needed  some.  For  authors  with  books  in  the  publisher  program  and  for  authors  of  books 
abandoned  by  a  publisher,  Google  unleashed  a  chance  that  more  people  would  at  least  read, 
and  perhaps  buy,  the  creation  they  had  sweated  for  years  to  complete. 

6.  The  Case  Against  Google 

Some  authors  and  many  publishers  found  more  evil  than  genius  in  Google's  plan.  Two  points 
outraged  them:  the  virtual  copy  of  the  book  that  sat  on  Google's  indexing  server  and  Google's 
assumpfion  that  it  could  scan  first  and  ask  quesUons  later.  On  both  counts  the  authors  and 
publishers  accused  Google  of  blatant  copyright  infringement.  When  negofiations  failed  last 
fall,  the  Authors  Guild  and  five  big  publishing  companies  sued  Google.  Their  argument  was 
simple:  Why  shouldn't  Google  share  its  ad  revenue  (if  any)  with  the  copyright  owners?  And 
why  shouldn't  Google  have  to  ask  permission  from  the  legal  copyright  holder  before  scanning 
the  work  in  any  case?  (I  have  divided  loyalties  in  the  case.  The  current  publisher  of  my  books 
is  suing  Google  to  protect  my  earnings  as  an  author.  At  the  same  time,  I  earn  income  from 
Google  Adsense  ads  placed  on  my  blog.) 


One  mark  of  the  complexity  of  this  issue  is  that  the  publishers  suing  were,  and  sfill  are. 
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committed  partners  in  the  Google  Book  Search  Partner  Program.  They  still  want  Google  to 
index  and  search  their  in-print  books,  even  when  they  are  scanning  the  books  themselves, 
because,  they  say,  search  is  a  discovery  tool  for  readers.  The  ability  to  search  the  scans  of  all 
books  is  good  for  profits. 

The  argument  about  sharing  revenue  is  not  about  the  three  or  four  million  books  that 
publishers  care  about  and  keep  in  print,  because  Google  is  sharing  revenues  for  those  books 
with  publishers.  (Google  says  publishers  receive  the  "majority  share"  of  the  income  from  the 
small  ads  placed  on  partner-program  pages.)  The  argument  is  about  the  75  percent  of  books 
that  have  been  abandoned  by  publishers  as  uneconomical.  One  curious  fact,  of  course,  is  that 
publishers  only  care  about  these  orphans  now  because  Google  has  shifted  the  economic 
equation;  because  of  Book  Search,  these  dark  books  may  now  have  some  sparks  in  them,  and 
the  publishers  don't  want  this  potential  revenue  stream  to  slip  away  from  them.  They  are 
now  busy  digging  deep  into  their  records  to  see  what  part  of  the  darkness  they  can  declare  as 
their  own. 

The  second  complaint  against  Google  is  more  complex.  Google  argues  that  it  is  nearly 
impossible  to  track  dovm  copyright  holders  of  orphan  works,  and  so,  it  says,  it  must  scan 
those  books  first  and  only  afterward  honor  any  legitimate  requests  to  remove  the  scan.  In  this 
way,  Google  follows  the  protocol  of  the  Internet.  Google  scans  all  Web  pages;  if  it's  on  the 
Web,  it's  scanned.  Web  pages,  by  default,  are  born  copyrighted.  Google,  therefore,  regularly 
copies  billions  of  copyrighted  pages  into  its  index  for  the  public  to  search.  But  if  you  don't 
)       want  Google  to  search  your  Web  site,  you  can  stick  some  code  on  your  home  page  with  a 
no-searching  sign,  and  Google  and  every  other  search  engine  will  stay  out.  A  Web  master 
thus  can  opt  out  of  search.  (Few  do.)  Google  applies  the  same  principle  of  opting-out  to  Book 
Search.  It  is  up  to  you  as  an  author  to  notify  Google  if  you  don't  want  the  company  to  scan  or 
search  your  copyrighted  material.  This  might  be  a  reasonable  approach  for  Google  to  demand 
from  an  author  or  publisher  if  Google  were  the  only  search  company  around.  But  search 
technology  is  becoming  a  commodity,  and  if  it  turns  out  there  is  any  money  in  it,  it  is  not 
impossible  to  imagine  a  hundred  mavericks  scanning  out-of-print  books.  Should  you  as  a 
creator  be  obliged  to  find  and  notify  each  and  every  geek  who  scanned  your  work,  if  for  some 
reason  you  did  not  want  it  indexed?  What  if  you  miss  one? 

There  is  a  technical  solution  to  this  problem:  for  the  search  companies  to  compile  and 
maintain  a  common  list  of  no-scan  copyright  holders.  A  publisher  or  author  who  doesn't  want 
a  work  scanned  notifies  the  keepers  of  the  common  list  once,  and  anyone  conducting 
scanning  would  have  to  remove  material  that  was  listed.  Since  Google,  like  all  the  other  big 
search  companies  —  Microsoft,  Amazon  and  Yahoo  —  is  foremost  a  technical-solution 
company,  it  favors  this  approach.  But  the  battle  never  got  that  far. 

7.  When  Business  Models  Collide 

In  thinking  about  the  arguments  around  search,  I  realized  that  there  are  many  ways  to 
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conceive  of  this  conflict.  At  first,  I  thought  that  this  was  a  misunderstanding  between  people 
of  the  book,  who  favor  solutions  by  laws,  and  people  of  the  screen,  who  favor  technology  as  a 
solution  to  all  problems.  Last  November,  the  New  York  Public  Library  (one  of  the  "Google 
Five")  sponsored  a  debate  between  representatives  of  authors  and  publishers  and  supporters 
of  Google.  I  was  tickled  to  see  that  up  on  the  stage,  the  defenders  of  the  book  were  from  the 
East  Coast  and  the  defenders  of  the  screen  were  from  the  West  Coast.  But  while  it's  true  that 
there's  a  strand  of  cultural  conflict  here,  I  eventually  settled  on  a  different  framework,  one 
that  I  found  more  useful.  This  is  a  clash  of  business  models. 

Authors  and  publishers  (including  publishers  of  music  and  film)  have  relied  for  years  on 
cheap  mass-produced  copies  protected  from  counterfeits  and  pirates  by  a  strong  law  based 
on  the  dominance  of  copies  and  on  a  public  educated  to  respect  the  sanctity  of  a  copy.  This 
model  has,  in  the  last  century  or  so,  produced  the  greatest  flowering  of  human  achievement 
the  world  has  ever  seen,  a  magnificent  golden  age  of  creative  works.  Protected  physical  copies 
have  enabled  millions  of  people  to  earn  a  living  directly  from  the  sale  of  their  art  to  the 
audience,  without  the  weird  dynamics  of  patronage.  Not  only  did  authors  and  artists  benefit 
from  this  model,  but  the  audience  did,  too.  For  the  first  time,  billions  of  ordinary  people 
were  able  to  come  in  regular  contact  with  a  great  work.  In  Mozart's  day,  few  people  ever 
heard  one  of  his  symphonies  more  than  once.  With  the  advent  of  cheap  audio  recordings,  a 
barber  in  Java  could  listen  to  them  all  day  long. 

But  a  new  regime  of  digital  technology  has  now  disrupted  all  business  models  based  on 
mass-produced  copies,  including  individual  livelihoods  of  artists.  The  contours  of  the 
electronic  economy  are  still  emerging,  but  while  they  do,  the  wealth  derived  from  the  old 
business  model  is  being  spent  to  try  to  protect  that  old  model,  through  legislation  and 
enforcement.  Laws  based  on  the  mass-produced  copy  artifact  are  being  taken  to  the  extreme, 
while  desperate  measures  to  outlaw  new  technologies  in  the  marketplace  "for  our  protection" 
are  introduced  in  misguided  righteousness.  (This  is  to  be  expected.  The  fact  is,  entire 
industries  and  the  fortunes  of  those  working  in  them  are  threatened  with  demise. 
Newspapers  and  magazines,  Hollywood,  record  labels,  broadcasters  and  many  hard-working 
and  wonderful  creative  people  in  those  fields  have  to  change  the  model  of  how  they  earn 
money.  Not  all  will  make  it.) 

The  new  model,  of  course,  is  based  on  the  intangible  assets  of  digital  bits,  where  copies  are 
no  longer  cheap  but  free.  They  freely  flow  everywhere.  As  computers  retrieve  images  from 
the  Web  or  display  texts  from  a  server,  they  make  temporary  internal  copies  of  those  works. 
In  fact,  every  action  you  take  on  the  Net  or  invoke  on  your  computer  requires  a  copy  of 
something  to  be  made.  This  peculiar  superconductivity  of  copies  spills  out  of  the  guts  of 
computers  into  the  culture  of  computers.  Many  methods  have  been  employed  to  try  to  stop 
the  indiscriminate  spread  of  copies,  including  copy-protection  schemes,  hardware-crippling 
devices,  education  programs,  even  legislation,  but  all  have  proved  ineffectual.  The  remedies 
are  rejected  by  consumers  and  ignored  by  pirates. 


j2Qfj5  5/15/06  10:09  AM 


Scan  This  Book!  -  New  York  Times  http://www.nytimes.com/2006/05/  14/magazine/  14publishing.htnil?.. 

As  copies  have  been  dethroned,  the  economic  model  built  on  them  is  collapsing.  In  a  regime 
of  superabundant  free  copies,  copies  lose  value.  They  are  no  longer  the  basis  of  wealth.  Now 
relationships,  links,  connection  and  sharing  are.  Value  has  shifted  away  from  a  copy  toward 
the  many  ways  to  recall,  annotate,  personalize,  edit,  authenticate,  display,  mark,  transfer  and 
engage  a  work.  Authors  and  artists  can  make  (and  have  made)  their  livings  selling  aspects  of 
their  works  other  than  inexpensive  copies  of  them.  They  can  sell  performances,  access  to  the 
creator,  personalization,  add-on  information,  the  scarcity  of  attention  (via  ads),  sponsorship, 
periodic  subscriptions  —  in  short,  all  the  many  values  that  cannot  be  copied.  The  cheap  copy 
becomes  the  "discovery  tool"  that  markets  these  other  intangible  valuables.  But  selling 
things-that-cannot-be-copied  is  far  from  ideal  for  many  creative  people.  The  new  model  is 
rife  with  problems  (or  opportunities).  For  one  thing,  the  laws  governing  creating  and 
rewarding  creators  still  revolve  around  the  now-fragile  model  of  valuable  copies. 

8.  Search  Changes  Everythmg 

The  search-engine  companies,  including  Google,  operate  in  the  new  regime.  Search  is  a 
wholly  new  concept,  not  foreseen  in  version  i.o  of  our  intellectual-property  law.  In  the  words 
of  a  recent  ruling  by  the  United  States  District  Court  for  Nevada,  search  has  a 
"transformative  purpose,"  adding  new  social  value  to  what  it  searches.  What  search  uncovers 
is  not  just  keywords  but  also  the  inherent  value  of  connection.  While  almost  every  artist 
recognizes  that  the  value  of  a  creation  ultimately  rests  in  the  value  he  or  she  personally  gets 
from  creating  it  (and  for  a  few  artists  that  value  is  sufficient),  it  is  also  true  that  the  value  of 
)       any  work  is  increased  the  more  it  is  shared.  The  technology  of  search  maximizes  the  value  of 
a  creative  work  by  allowing  a  billion  new  connections  into  it,  often  a  billion  new  connections 
that  were  previously  inconceivable.  Things  can  be  found  by  search  only  if  they  radiate 
potential  connections.  These  potential  relationships  can  be  as  simple  as  a  title  or  as  deep  as 
hyperlinked  footnotes  that  lead  to  active  pages,  which  are  also  footnoted.  It  may  be  as 
straightforward  as  a  song  published  intact  or  as  complex  as  access  to  the  individual 
instrument  tracks  —  or  even  individual  notes. 

Search  opens  up  creations.  It  promotes  the  civic  nature  of  publishing.  Having  searchable 
works  is  good  for  culture.  It  is  so  good,  in  fact,  that  we  can  now  state  a  new  covenant: 
Copyrights  must  be  counterbalanced  by  copyduties.  In  exchange  for  public  protection  of  a 
work's  copies  (what  we  call  copyright),  a  creator  has  an  obligation  to  allow  that  work  to  be 
searched.  No  search,  no  copyright.  As  a  song,  movie,  novel  or  poem  is  searched,  the  potential 
connections  it  radiates  seep  into  society  in  a  much  deeper  way  than  the  simple  publication  of 
a  duplicated  copy  ever  could. 

We  see  this  effect  most  clearly  in  science.  Science  is  on  a  long-term  campaign  to  bring  all 
knowledge  in  the  world  into  one  vast,  interconnected,  footnoted,  peer-reviewed  web  of  facts. 
Independent  facts,  even  those  that  make  sense  in  their  own  world,  are  of  little  value  to 
science.  (The  pseudo-  and  parasciences  are  nothing  less,  in  fact,  than  small  pools  of 
knowledge  that  are  not  connected  to  the  large  network  of  science.)  In  this  way,  every  new 
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observation  or  bit  of  data  brought  into  the  web  of  science  enhances  the  value  of  all  other  data 
points.  In  science,  there  is  a  natural  duty  to  make  what  is  known  searchable.  No  one  argues 
that  scientists  should  be  paid  when  someone  finds  or  duplicates  their  results.  Instead,  we 
have  devised  other  ways  to  compensate  them  for  their  vital  work.  They  are  rewarded  for  the 
degree  that  their  work  is  cited,  shared,  linked  and  connected  in  their  publications,  which 
they  do  not  own.  They  are  financed  with  extremely  short-term  (20-year)  patent  monopolies 
for  their  ideas,  short  enough  to  truly  inspire  them  to  invent  more,  sooner.  To  a  large  degree, 
they  make  their  living  by  giving  away  copies  of  their  intellectual  property  in  one  fashion  or 
another. 

The  legal  clash  between  the  book  copy  and  the  searchable  Web  promises  to  be  a  long  one. 
Jane  Friedman,  the  C.E.O.  of  HarperCollins,  which  is  supporting  the  suit  against  Google 
(while  remaining  a  publishing  partner),  declared,  "I  don't  expect  this  suit  to  be  resolved  in 
my  lifetime."  She's  right.  The  courts  may  haggle  forever  as  this  complex  issue  works  its  way 
to  the  top.  In  the  end,  it  won't  matter;  technology  will  resolve  this  discontinuity  first.  The 
Chinese  scanning  factories,  which  operate  under  their  own,  looser  intellectual-property 
assumptions,  will  keep  churning  out  digital  books.  And  as  scanning  technology  becomes 
faster,  better  and  cheaper,  fans  may  do  what  they  did  to  music  and  simply  digitize  their  own 
libraries. 

What  is  the  technology  telling  us?  That  copies  don't  count  any  more.  Copies  of  isolated  books, 
bound  between  inert  covers,  soon  won't  mean  much.  Copies  of  their  texts,  however,  will  gain 
in  meaning  as  they  multiply  by  the  millions  and  are  flung  around  the  world,  indexed  and 
copied  again.  What  counts  are  the  ways  in  which  these  common  copies  of  a  creative  work  can 
be  linked,  manipulated,  annotated,  tagged,  highlighted,  bookmarked,  translated,  enlivened 
by  other  media  and  sewn  together  into  the  universal  library.  Soon  a  book  outside  the  library 
will  be  like  a  Web  page  outside  the  Web,  gasping  for  air.  Indeed,  the  only  way  for  books  to 
retain  their  waning  authority  in  our  culture  is  to  wire  their  texts  into  the  universal  library. 

But  the  reign  of  livelihoods  based  on  the  copy  is  not  over.  In  the  next  few  years,  lobbyists  for 
book  publishers,  movie  studios  and  record  companies  will  exert  every  effort  to  mandate  the 
extinction  of  the  "indiscriminate  flow  of  copies,"  even  if  it  means  ouflawing  better  hardware. 
Too  many  creative  people  depend  on  the  business  model  revolving  around  copies  for  it  to 
pass  quietly.  For  their  benefit,  copyright  law  will  not  change  suddenly. 

But  it  will  adapt  eventually.  The  reign  of  the  copy  is  no  match  for  the  bias  of  technology.  All 
new  works  will  be  born  digital,  and  they  will  flow  into  the  universal  library  as  you  might  add 
more  words  to  a  long  story.  The  great  continent  of  orphan  works,  the  25  million  older  books 
bom  analog  and  caught  between  the  law  and  users,  will  be  scanned.  Whether  this  vast 
mountain  of  dark  books  is  scanned  by  Google,  the  Library  of  Congress,  the  Chinese  or  by 
readers  themselves,  it  will  be  scanned  well  before  its  legal  status  is  resolved  simply  because 
technology  makes  it  so  easy  to  do  and  so  valuable  when  done.  In  the  clash  between  the 
conventions  of  the  book  and  the  protocols  of  the  screen,  the  screen  will  prevail.  On  this 
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screen,  now  visible  to  one  billion  people  on  earth,  the  technology  of  search  will  transform 
isolated  books  into  the  universal  library  of  all  human  knowledge. 

Kevin  Kelly  is  the  "senior  maverick"  at  Wired  magazine  and  author  of  "Out  of  Control:  The 
New  Biology  of  Machines,  Social  Systems  and  the  Economic  World"  and  other  books.  He 
last  wrote  for  the  magazine  about  digital  music. 

Correction:  May  14,  2006 

An  article  on  Page  42  of  The  Times  Magazine  today  about  the  future  of  book  publishing  misstates  the  number  and  type  of  libraries  in 
China  from  which  a  Chinese  company,  Superstar,  has  made  digital  copies  of  books.  It  is  200  libraries  of  all  kinds,  not  goo  university 
libraries. 
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Brewster  Kahle's  modest  mission: 
Archiving  everything 

06/23/06    I    By  Elinor  Mills 

Brewster  Kahle  is  on  a  mission.  He  wants  the  whole  planet  to  have  access  to 
human  knowledge.  All  human  knowledge.  And  he's  striving  to  make  that 
possible— one  byte  at  a  time. 

Ten  years  ago,  Kahle  founded  the  nonprofit  Internet  Archive,  with  the  goal  of 
preserving  the  hitherto  ephemeral  pleasures  of  the  Net  for  posterity.  But, 
unsatisfied  with  limiting  himself  to  the  saving  of  Web  sites,  Kahle  decided  to 
broaden  his  scope  and  include  existing  collections  of  books,  television  programs, 
movies  and  music  in  the  archive's  massive  digital  repository. 

In  addition  to  all  that  digitizing,  and  the  free  hosting  of  audio  and  video  content, 
the  archive  also  sponsors  the  SFLan.org  project,  which  offers  free  wireless  Internet 
in  San  Francisco. 


B 
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Kahle  enthusiastically  discusses  his  ambitious  plan 
to  build,  make  freely  accessible  and  preserve  what 
he  calls— in  reference  to  the  legendary  lost  library 
of  the  ancient  world-the  "Library  of  Alexandria, 

V.2." 


"Let's  have  a  library  system  that  is  in  the  great 
traditions  of  Thomas  Jefferson,  Andrew  Carnegie  and  the  Library  of  Alexandria," 
he  says  while  showing  a  reporter  around  the  Internet  Archive's  offices  in  San 
Francisco's  Presidio.  "If  we  are  able  to  build  that  library  again  with  the  vision  of 
the  Greeks  but  the  technology  of  the  modem  era,  that's  something  to  be  proud  of" 

The  45-year-old  Kahle,  hyperarticulate  and  humble,  often  sports  a  quizzical 
expression  that,  with  his  spectacles;  graying,  curly  hair;  and  bushy  eyebrows,  lends 
him  a  quirky,  owlish  look.  He's  described  as  a  geek  by  friends,  but  a  balanced  one, 
whose  hobbies  range  from  sailing  with  his  wife  and  two  young  sons  to  spending 
time  at  a  theater  camp  in  Vermont. 

His  favorite  book  is  "The  Autobiography  of  Benjamin  Franklin,"  and  he's  recently 
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taken  to  listening  to  a  musical  group  called  The  Ditty  Bops,  which  he  describes  as 
a  cross  between  The  Andrews  Sisters  and  The  Roches. 

) 

An  alliance  for  free  access 

Kahle  relishes  his  role  as  Internet  archivist.  The  staggering  volume  of  material  to 
digitize— centuries  of  historic  media,  and  new  data  appearing  by  the 
minute— doesn't  daunt  him.  Commercial  interests  whose  monetizing  efforts 
threaten  free  universal  access  do.  So  he  readily  takes  up  the  cause  to  fight  for 
freely  accessible  information. 

"If  we  lose  (the  library  of  human  knowledge)  to  a  corporate  interest,  I  would  have 
screwed  up.  Having  it  go  to  corporate  hands  is  my  worst  nightmare,"  he  says. 

Which  brings  us  to  the  Open  Content  Alliance,  a  joint  effort  by  the  Internet 
Archive,  Yahoo  and  Microsoft  to  digitize  library  collections,  including  those  of 
the  University  of  California  system  and  The  University  of  Toronto.  Unlike  a 
similar  project  from  Google,  which  allows  users  to  read  the  digitized  content  only 
through  Google's  Web  site,  the  OCA  material  will  be  searchable  through  any 
service  and  everyone  will  be  encouraged  to  download  books. 

Also,  the  OCA  is  digitizing  only  books  in  the  public  domain,  whereas  Google  is 
including  copyright-protected  titles  in  its  scanning  efforts  and  will  offer  small 
.  snippets  of  such  texts  to  those  searching  its  database.  As  a  result  of  Google's 

'  approach,  groups  representing  authors  and  publishers  have  sued  the  search  giant. 

"Some  would  like  to  control  (information)  so  fewer  people  make  money  and  have 
access.  This  is  not  right,"  Kahle  says.  "1  really  want  the  Enlightenment-era  ideal  of 
universal  education." 

Kahle  is  not  opposed  to  companies  turning  a  profit-he  pocketed  millions  in  1995 
when  AOL  bought  his  first  company,  WAIS,  one  of  the  first  Internet  search 
systems.  Much  of  that  windfall  went  to  fund  the  Internet  Archive,  which  has  an 
annual  budget  of  about  $5  million. 

"I'm  not  against  people  making  money.  In  fact,  it's  absolutely  essential,"  he  says, 
adding  that  there's  plenty  of  money  to  be  made  off  services  related  to  the 
distribution  of  free  information. 


ZDNet  I  Copyright  ©  2006  CMET  Networks,  Inc  All  Rights  Reserved 


http://news.zdnet.com/2102-9588_22-6087167.html 


lof  1 


Encyclopedia  of 

!^^  Language  &  Linguistics 

ORDER  NOW  &  SAVE  20% 


2ND  EDITION  /  14-VoImw*  Set 


Print  Article  I  Close  Window 

Library  Journal 

Release  of  Google  Contract  with  UC  Sparks  Criticism 

—  September  5, 2006 

The  University  of  California  (UC)  has  released  the  tenns  of  its  contract  with  Google  to  scan  books  in  UC's  library 
collections. 

The  University  of  California  (UC)  has  released  the  terms  of  its  contract  with  Google  to  scan  books  in  UC's  library 
collections.  The  six-year  deal  involves  at  least  2.5  million  books,  with  UC  providing  at  least  600  books  per  day  for  the  first 
two  months  and  more  once  the  project  is  us  up  to  full  capacity.  In  retum  for  its  participation,  UC  will  receive  one  copy  of 
the  scans.  The  university,  however,  can't  share,  license,  or  sell  its  scans  to  any  third  party  and  can  redistribute  no  more 
than  ten  percent  of  scanned  material  to  other  libraries  or  schools,  even  for  educational  purposes — which  constrains 
interlibrary  loan.  The  release  of  the  terms  fulfills  a  request  from  the  Chronicle  of  Higher  Education  and  responds  to  a 
"general  interest,"  UC  said. 

Though  the  contract  does  not  seem  much  different  from  the  University  of  Michigan/Google  contract  made  public  last  year, 
it  still  sparked  criticism.  Internet  Archive  and  Open  Content  Alliance  (OCA)  founder  Brewster  Kahle  noted  that  the  contract 
shows  there  has  been  no  evolution  in  Google's  practices.  Kahle  stressed  that  the  Google  project  is  not  a  public  resource 
but  "the  private  library"  of  a  single  corporation,  while,  on  the  other  hand,  the  OCA  is  committed  to  openness.  "It's  a  little 
hard  for  me  to  understand,"  he  said  of  UC's  partnership  with  Google.  "Because  I  do  believe  they  understand  the 
difference."  He  added,  "I  hope  it  doesn't  discourage  those  interested  in  the  open  sphere."  Kahle  said.  "If  we  have  a  retreat 
where  libraries  don't  go  with  their  original  principle  of  public  service  and  just  try  to  sell  off  their  collections— to  the  lowest 
bidder,  for  a  copy  of  a  scan  they  can't  do  much  with — it  is  another  step  toward  the  privatization  of  the  library  system.  I'm 
going  to  put  every  effort  I  can  to  keep  a  public  library  system  alive."  He  said  the  OCA  was  progressing  and  urged  the 
library  community  to  consider  the  benefits  of  collaboration,  which  could  confront  the  book  industry's  concerns  about 
copyright.  If  the  average  book  is  300  pages,  for  $300  million  you  could  have  a  high-quality  ten  million-book  library,  he  said. 
"The  library  market  is  a  $12  billion  industry.  For  $300  million  a  universal  library  is  within  our  grasp,  technically  and 
financially.  The  question  is,  what  role  will  the  library  system  play  in  making  this  happen  and  at  the  end  of  the  day  will  we 
have  a  private  library  system  or  a  public  library  system?" 
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charts  the  moves  of  the  recent  deal  between  Google  and  the  University  of  California.  UC  is  the  latest  public  institution  to  succumb  to 
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searchable. 

From  the  University  of  California  press  release: 

"The  academic  enterprise  is  fundamentally  about  discovery,"  said  John  Oakley,  chair  of  UC's 
systemwide  Academic  Senate  and  a  UC  Davis  law  professor.  "We  contribute  to  it  immeasurably  by 
unlocking  the  wealth  of  information  maintained  within  our  libraries  and  exposing  it  to  the  latest 
that  search  technologies  have  to  offer." 

The  Open  Content  Alliance  is  an  international  coalition  of  cultural  institutions,  technology  vendors, 
nonprofits  and  governmental  organizations  from  around  the  world  that  hopes  to  create  a  permanent  archive  of 
multilingual  digitized  text  and  multimedia  content.  The  index  will  be  open  source,  so  that  anyone  can  build  a 
front  end  and  search  service  for  it. 

The  OCA  officially  kicked  off  in  October  2006,  with  participation  from  Yahoo  and  MSN.  Google  was  rather 
conspicuously  absent.  Ever  its  own  person,  in  November,  Google  donated  $3  million  to  seed  a  digitization 
project  by  the  United  States  Library  of  Congress. 

"The  OCA  principles  were  designed  around  discussions  I  had  with  Larry  Page,"  Kahle  says.  If  Google  won't 
participate,  he  wishes  it  would  at  least  "move  5  degrees  to  the  left.  Then  we'd  have  one  project,  and  it  would 
be  tremendous.  And  if  Yahoo  and  Microsoft  don't  think  it  goes  against  their  commercial  businesses,  then  why 
would  Google?  That's  the  puzzle." 

Google  is  the  Angelina  Jolie  of  commerce:  glamorous,  desirable,  powerful,  mysterious.  The  OCA  is  open, 
collaborative,  inclusive.  Will  Brad  —  I  mean,  UC  —  come  to  regret  this  secret  alliance'? 

In  theory,  there's  no  reason  that  UC  —  and  other  universities  and  public  libraries  —  couldn't  and  shouldn't 
work  with  Google  to  make  sure  their  books  are  searchable  within  the  world's  most-used  search  engine. 

In  the  Television  Archiving  blog.  digital  media  consultant  Jeff  Ubois  writes,  ".  .  as  a  practical  matter,  scanning 
doesn't  happen  twice.  ..  This  deal  will  be  costly  for  UC  in  staff  time  and  other  resources,  and  the  chances  that 
another  vendor  will  come  through  and  duplicate  the  work  are  slim. " 

He  told  me,  "Librarians  that  are  considering  mass  digitization  are  making  maybe  the  most  complex 
consideration  of  their  careers.  But,  in  negotiations  with  most  commercial  companies,  they  get  put  under  NDA 
so  they  can't  discuss  what's  happening  with  their  peers.  You  can't  expect  people  to  make  wise  decisions  in  a 
vacuum." 

Unfortunately,  the  public  good  may  not  be  able  to  compete  with  the  good  offered  by  private  enterprise,  to  wit: 
money.  An  April  agreement  between  Showtime  and  the  Smithsonian  gave  the  cable  broadcaster  near-exclusive 
access  to  Smithsonian  archives,  leading  documentary  filmmakers  and  scholars  to  protest. 
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review  the  secret  contract,  the  motivation  became  clear.  According  to  a  Washington 


The  Smithsonian  is  guaranteed  $500,000  a  year,  and  can  earn  additional  money  if  Smithsonian 
on  Demand,  Showtime  programming  based  on  Smithsonian  holdings,  is  popular  with  cable 
subscribers. 


Terms  of  the  UC  contract  with  Google  weren't  made  public,  but  the  search  goliath  has  been  willing  to  share 
revenue  from  search  ads  that  appear  alongside  book  search  results  with  publishers.  That  might  have  been  part 
of  the  lure  —  aside  from  the  irresistible  desire  for  almost  everyone  to  somehow  get  their  names  next  to 
Google's. 

Kahle  says  these  deals  with  Google  will  result  in  privatizing  the  world's  cultural  heritage. 

He  says,  "Google  could  be  helping  to  build  a  public  library  system  and  they've  decided  to  make  a  closed 
one.  We  can  understand  why  Google  would  want  to  do  this,  but  why  would  UC  want  to  spend  millions  of 
dollars  helping  a  single  corporation?  " 
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Silicon  Valley's  technologists  may  have  taken  on  the  trappings  of  corporate  America  over  the  years, 
but  their  counterculture  streak  will  be  alive  and  well  in  the  Nevada  desert  this  week. 

Among  the  tens  of  thousands  of  attendees  at  the  Burning  Man  art  festival  in  Nevada's  Black  Rock  desert  will  be 
programmers,  Web  designers  and  perhaps  more  than  a  few  millionaire  executives.  In  past  years,  Burning  Man 
attendees  have  included  Amazon.com  founder  Jeff  Bezos  and  Google  co-founders  Larry  Page  and  Sergey  Brin.  Even 
Google  CEO  Eric  Schmidt,  more  a  Baby  Boomer  than  a  member  of  Burning  Man's  Generation  X  crowd,  has  attended 
the  event. 

Plenty  of  other  big  tech  names,  from  Brian  Behlendorf,  the  primary  developer  of  the 
Apache  Web  server,  to  Brewster  Kahle.  founder  of  the  Internet  Archive,  have  also 
gone  to  Burning  Man  over  the  years.  And  they  say  they've  been  there  for  one 
simple  reason:  The  same  sort  of  creativity  and  collaborative  thinking  they've  applied 
to  software  and  the  Internet  is  on  display  in  spades  at  the  desert  art  festival. 

In  fact,  some  would  argue,  Web  2.0-style  content  sharing  on  the  Internet  today 
looks  an  awful  lot  like  Burning  Man  community-building,  circa  1997. 

"I'd  say  the  same  types  of  people  who  want  to  share  things  openly  gravitated  to  the  Internet  and  the  openness  of 
Burning  Man,"  said  Kahle.  "There's  a  great  deal  of  overlap  in  the  (shared)  philosophy  of  'let's  make  things  and  share 
them.'" 

Of  course,  tech  creativity  and  counterculture 

experimentation  have  gone  hand-in-hand  for  decades. 

Last  year,  The  New  York  Times'  John  Markoff  wrote 

"What  the  Dormouse  Said,"  a  best-selling  book  about 

the  influence  of  '60s  culture  and  drugs  on  the 

emerging  tech  industry  in  Silicon  Valley.  Video:  Burning  Man:  What's  the  fascination  for  Silicon  Valley? 

While  those  rule-breakers  have  (hopefully)  long  since  grown  up,  a  second  generation  of  boundary-pushing 
technologists  annually  make  their  way  to  the  desert  to  participate  in  the  spectacle  of  an  Oz-like  city  built  from  scratch 
on  scorched  playa. 

In  fact,  the  growth  of  Burning  Man  jibes  neatly  with  the  growth  of  the  World  Wide  Web,  from  the  1994  release  of  the 
first  Netscape  browser  to  today's  Web  2.0  start-ups.  As  Burning  Man  turns  20,  it's  noteworthy  to  look  at  just  how 
closely  intertwined  the  art-focused  festival  and  the  tech  industry  have  become. 

Burning  Man  began  in  1986  on  San  Francisco's  Baker  Beach.  In  1990,  it  moved  to  the  remote  Nevada  desert,  and  by 
1993,  a  year  before  Netscape's  browser  was  released.  Burning  Man  attracted  about  1,000  people.  By  1994,  as  the 
Burning  Man  organization  put  up  its  first  Web  site,  the  population  swelled  to  2,000. 

In  1995,  it  doubled  again,  to  4,000,  and  the  next  year  it  was  up  to  8,000.  To  some,  there's  no  doubt  that  this  would  not 
have  been  possible  without  the  Web,  which  itself  was  exploding  in  popularity. 

"The  Internet  and  the  Web  is  what  caused  (Burning  Man)  to  be  such  an  international  event,"  said  Scott  Beale,  a 
longtime  Web  master  of  the  Burning  Man  Web  site  who  runs  the  Web  hosting  company  Laughing  Squid.  "There  weren't 
even  any  documentaries  (yet).  So  how  would  these  guys  have  gotten  information  on  it?  It's  because  of  the  Web  site 
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and  the  photos  online.  The  photos  had  to  sell  it." 

"Evil  Pippi,"  as  one  longtime  member  of  Burning  Man's  media  team  calls  herself 
(many  participants  adopt  pseudonyms  and  like  to  stay  in  character),  agreed.  "Global  Related  story 
information  meant  global  participation,"  she  said.  "I  think  the  vivid  imagery  that  could  Qay  pendant  has^special^ 
be  shared  over  the  Web  really  attracted  folks  in  droves."  meaning  for  Burner. 

On  April  1,  1997,  Burning  Man  launched  burninqman.com  after  two  years  of  having 

its  site  on  other  people's  domains,  said  Marian  Goodell,  who  runs  communications  for  Burning  Man  and  was  influential 
in  convincing  the  event's  organizers  to  build  and  own  its  own  domain.  Ironically,  the  site  launched  that  day  because 
organizers  knew  an  online  media  organization  was  about  to  publish  a  story  about  the  event. 

"We  loved  that  it  was  launched  on  (April  Fool's  Day),"  Goodell  said.  "We  had  to  launch  it  that  day  because  (the  news 
organization)  was  doing  a  story  on  us  and  we  knew  we  didn't  want  them  pointing  to  the  old  URL.  We  were  driven  to 
finish  the  first  draft  of  burningman.com  for  a  media  story." 

Goodeli's  understanding  of  the  Internet  and  its  widespread  influence  also  led  to 
"There's  a  great  deal       Burning  Man  launching  "Jack  Rabbit  Speaks,"  a  regular  e-mail  newsletter  that 
of  overlap;  the  open  participants  could  subscribe  to  and  that  exists  to  this  day.  In  fact,  she  said,  the 

aspects  of  the  number  of  people  who  subscribe  to  it  tends  to  be  about  the  same  as  the  number  of 

Internet  and  Burning       p®°p'®  ^^°  ^°^^  '°  *^®  ®^®"*  '*^®'^  ®^^^  y®^""- 

Man  come  from  the  p>  Perhaps  even  more  important  than  the  Jack  Rabbit  Speaks  was  the 

same  place."  emergence  of  a  free  e-mail  discussion  list  to  which  anyone  could  subscribe 

-Brewster  Kahle,  founder,  and  that  became  the  center  of  many  so-called  Burners'  connection  to  their 

Internet  Archive  growing  community.  Even  as  the  Internet  was  affecting  Burning  Man, 

Burning  Man  in  turn  was  affecting  the  Internet-and  technology.  Some  of 
the  earliest  wireless  Internet  experiments  were  conducted  during  the  event  by  people  like  Electronic 
Frontier  Foundation  co-founder  John  Gilmore,  and  some  of  the  earliest  Web  communities  were  created  by 
so-called  Burners. 

One  example  is  Bianca,  a  Web  community  built  around  sharing  many  areas  of 
Elsewhere  on  CNET  interest.  Perhaps  it's  best  known,  however,  for  its  commitment  to  the  open 

':^^I^'1^"'"'^\  ,        ,    ,  discussion  of  smut,  and  for  many  years  "Bianca's  Smut  Shack"  was  one  of  the 

CNET  Reviews  takes  a  look  •'  ^ 

at  gear  for  Black  Rock  desert.  best-known  Burning  Man  theme  camps. 

"I  don't  know  of  (many)  other  communities  that  were  online  at  that  time  (1994)," 
said  Evil  Pippi.  "Techies  knew  how  to  search  the  Web  to  find  (Burning  Man)  by  only  knowing  its  name." 

Kahle  agreed  that  the  early  Web  and  Burning  Man  communities  went  hand-in-hand. 

"The  communities  are  very  interchangeable,"  he  said.  "There's  a  great  deal  of  overlap;  the  open  aspects  of  the  Internet 
and  Burning  Man  come  from  the  same  place." 

Kahle  also  said  that  while  people  think  of  the  Web  as  being  dominated  by  commercial  sites,  the  vast  majority  are 
non-commercial,  nurtured  by  many  of  the  San  Francisco  early  adopters  who  also  happened  to  be  Burners. 

"Burning  Man  and  the  Internet.. .disproved  the  1980s  myth  that  people  will  only  do  something  if  they're  paid  for  it,"  he 
said.  "With  Burning  Man,  people  would  work  for  weeks  and  months  and  years  to  build  things  to  just  have  done  them 
and  have  them  recognized  by  others." 

Over  the  years,  of  course,  countless  other  online  communities  sprouted  that  have  little  or  nothing  to  do  with  Burning 
Man.  And  as  many  more  people  attended  Burning  Man,  the  proportion  of  hard-core  Internet  and  technology  early 
adopters  from  the  San  Francisco  Bay  Area  diminished. 

But  that's  not  to  say  Burners  don't  still  have  a  deep  influence  on  new  technologies  and  new  forms  of  online  community. 
In  fact,  some  would  argue  that  the  very  notion  of  online  social  networks-which  with  the  immense  success  of 
MvSpace.com  has  become  mainstream-is  something  that  originated  with  the  Burning  Man  community. 

)  Burners  "were  the  earliest  users  of  social  networking  in  general,"  said  Mark  Pincus,  founder  of  Tribe.net.  "They  were 
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already  into  social  networking  more  than  anyone  else.  In  a  way,  Burning  Man  is  a  gigantic  social  network." 

But  in  the  end,  said  Goodell,  the  Burning  Man  community  has  managed  to  find  a  way  to  embrace  technology,  without 
becoming  consumed  by  it. 

"We're  into  human  interaction  over  online,"  she  said.  But  "online  just  primes  us  for  face-to-face." 
Copyright  ©1995-2006  CNET  Networks,  Inc.  All  rights  reserved. 
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Microsoft,  Joining  Growing  Digital-Library  Effort,  Will 
Pay  for  Scanning  of  150,000  Books 

BvJKFFRFYR.Y()UN(3 


With  a  $5-million 
commitment,  Microsoft's 
MSN  Search  division  is 
joining  universities  and 
its  online-search  rival 
Yahoo  in  a  consortium 
dedicated  to  scanning 
millions  of 

public-domain  books. 
The  company's  pledge 
will  pay  for  the  scanning 
of  150,000  volumes. 

The  consortium,  called 
the  Open  Content 
Alliance,  was  announced 
three  weeks  ago  (Tlie 
Chronicle,  October  3). 
Since  then,  14  more 
universities  have  also 
joined,  promising  to 
contribute  money,  books, 
or  services  to  the  project. 
The  original  members  of 
the  alliance  include  the 
University  of  California, 
the  University  of 
Toronto,  and  several 
archives  and  technology 
companies. 

The  consortium's 
approach  stands  in  stark 
contrast  to  that  of 
Google's  Library  Project, 
which  is  scanning  both 
books  in  the  public 
domain  and  books  still 
covered  by  copyright. 
Google  says  its  search 
results  will  include  only 
snippets  of  text  for 
copyright-protected 
volumes,  as  well  as  links 
to  book-selling  sites. 
Nevertheless,  five 
publishers  have  sued  the 
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company,  claiming  copyright  infringement  (The  ChfGmcle. 
October  20). 

The  latest  institutions  to  join  the  Open  Conteni  Alliance  are 
Columbia  University,  Emory  University,  the  Johns  Hopkins 
University  libraries,  the  Prelinger  Archives,  Research  Libraries 
Group,  Rice  University,  the  Smithsonian  Institution  libraries, 
the  Universities  of  Pittsburgh  and  Virginia,  and  five  Canadian 
institutions:  McMaster  University,  Memorial  University  of 
Newfoundland,  York  University,  and  the  Universities  of  Britis 
Columbia  and  Ottawa. 

Danielle  Tiedt,  a  general  manager  at  MSN,  said  the  company 
believed  it  was  a  good  idea  to  join  with  rivals  in  the  alliance  tc 
scan  books,  so  that  books  would  not  be  scanned  repeatedly  by 
competing  companies  and  so  that  Microsoft  could  focus  its 
energies  on  improving  its  search  technology.  The  company 
also  announced  that  it  would  create  a  new  service,  called  MSN 
Book  Search,  that  is  scheduled  to  begin  next  year. 

Ms.  Tiedt  said  that  Microsoft  planned  to  focus  on  adding  bool 
collections  that  would  provide  answers  to  Internet  search 
requests  that  current  online  sources  cannot  provide.  The 
company  estimates  that  more  than  50  percent  of  online  querie; 
go  unanswered  using  today's  search  engines. 

"The  Web  doesn't  have  all  the  answers,"  said  Ms.  Tiedt.  "Givei 
that  the  Web  only  has  about  1 0  years  of  information,  it's  not 
surprising  that  it  doesn't  have  all  the  answers  to  people's 
questions." 

The  company  has  not  yet  decided  which  books  it  will  scan,  M 
Tiedt  said,  adding  that  the  books  might  belong  to  institutions 
that  are  not  members  of  the  alliance.  "We're  committed  to 
150,000,  but  we're  completely  free  to  choose  where  those 
books  come  from,"  she  said.  "We're  not  constricted  to  just 
working  with  people  within  the  Open  Content  Alliance." 

That  rival  companies  are  joining  the  alliance  is  evidence  that 
the  project  is  "fundamentally  a  mechanism  of  sharing,"  said 
Brewster  Kahle,  director  of  the  Internet  Archive,  a  nonprofit 
digital  library  that  is  coordinating  the  book  scanning  for  the 
alliance. 

"An  open  library  allows  lots  of  people  to  participate,"  he  adde 
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The  World  s  Digital  Curator 


Brewster  KaWe  wants  tti  buiid  a  unuersal  library 
for  all  of  humankind,  banking  on  liie  growing 
trend  of  lechnological  non-profits.  His,  called 
the  Internet  Archive,  has  been  using  algorithms  to 
crawl  the  web  for  the  past  decade  using  a  ranking 
systcin  to  take  snapshots  o(  popular  \scbsites.  This 
search  (spyware-like)  technology  is  impressive  given 
its  age.  In  19H3,  while  working  at  the  startup, 
Brewster  created  a  system  called  Wide  Area 
Information  Servers  (WAIS),  It  was  the  world's  first 
incarnation  of  tng-and-rank,  web-brovvscr  search. 

Today,  the  Internet  Archive  is  strengthened  by  the 
new  Open  Content  Alliance  with  partners  like 
Adobe,  HP,  Microsoft  and  Yahoo!.  Brewster  admits 
tfaese  cnmpanit!;  came  on  iKjard  pnmauiy  to  stun 
the  ambitious  plans  of  Google,  but  theii  leverage  is 
vital  to  fill  the  universal  library  with  petabytes  of 
archived  movies,  websites,  and  any  type  of  record- 
able informatidji.  Currently,  the  universal  library  is 
without  books,  as  new  worldwide  scanning  efforts 
like  those  from  the  Internet  Archive  are  trying  to 
break  through  decades  of  corporate-minded  copy- 
right laws. 

Joi.  Robinson: !  low  did  yoti  find  your  ".  !iy  into  '.lie 
co:;;piiling  '.vor'd  in  the  19".!s? 
Kahle:  Usually  people  ask  how  did  I  get  into  libraries 
and  it  was  always  from  computing.  With  computers, 
if  you  do  things  right,  you  can  move  mountains  and 
in  this  case  mountains  are  hits.  The  idea  was  to  use 
computers  to  build  the  digital  library  that  we  have 
been  promised  lor  so  long. 


If  you  are  a  technologist,  you  have  to  use  your  tools 
•jr  something,  because  it  is  a  tool.  So  there  were  two 
yreat  projects.  One  was  cryptography  to  protect  peo- 
ple's privacy  and  the  other  was  to  build  the  library 
I  be  eacryptioa  one,  1  amJdn't  figure  out  how  to  do 
that  in  a  way  to  help  the  common  man  -  back  in  the 
late  '70s.  It  was  too  expensive.  It  woiiltl  help  basical- 
ly government  institutions  or  big  corporations  c" 
illegal  worlds  and  none  of  those  reallv  needed  < 
hdp. 


It  was  a  spin  (iff  of  MIT.  We  took  the  project  that  we 
were  doing  and  built  a  ciimpanv  around  it.  We  tried 
to  build  a  computer.  1  helped  design  the  chips  and 
boards  and  operating  system  for  it  and  then  tried  to 
use  it  for  searching  the  library.  Anil  then  the  ne.Kt 
step  was  to  use  this  internet  as  a  distribution  system 
to  get  publishing  going  and  that  is  what  WAIS  was.  It 
was  the  first  internet  publishing  system. 

With:-  .    _,  ,_  ;.  ■.\:^!S,^   ■-- 

did  Ti;  ■  ;    : 

U  vitas  sold  li>r  parts.  It  w.is  an  astoni':liing  group  to 
work  with.  The  biggest  problem  was  that  we  built  a 
parallel  computer  that  not  very  many  people  knew 
how  to  use.  And  now  we  are...  gosh,  what,  20  years 
later,  and  the  Internet  Archive,  Google,  Hotmail, 
base aH  built  paialki  compatErs. 


Oh  yeah,  absolutely  Aftei  we  got  the  hardware  work- 
ing, we  put  a  search  engine  on  it  and  that  was  used  by 
iMiw  Jones  to  search  through  400  newspapers  and 
articles.  It  was  part  of  the  Dow  Jones'  DowQuest  and 
it  was  a  pretext  search  that  really  helped  find  patterns 
and  rank  things.  We  found  that  we  really  needed  the 
internet  publishing  system   to  go,  so  that  people 


Computer  visionary  Brewster  Kahle.  founder  of  the  Internet  Archive  and  Open  Content  Alliance 


would  feed  it  krts  erf  matenak.  That  »ras  wlial  WAIS 
was  about. 

Why  did  AOL  bu-.  vV'AIS  in  1955  and  how  uviportanf 
.....c  !  i«^  '■■■r-.v-''".i\\Dn  in  sha;  ?ng  today's   -vorid  of 

AOL  had  a  royalty  model  at  the  time,  which  I  loved.  I 
like  the  royalty  model  more  than  the  advertising 
model.  So  about  15  percent  of  their  gross  revenue 
would  go  to  the  information  service  that  kept  the  per- 
.san  online.  And  they  bought  up  a  bunch  dI  intjsrnet 
companies  at  that  lime  to  build  AOL  2.0.  Bui  then  they 
decided  to  not  make  internet  2.11  because  their  existing 
business  was  going  just  fine  and  they  all  started  to 
adopt  this  cable  model. 

Will  we  ever  get  bi-.ck  to  a  roy;;!iy  system? 
I  sure  hope  so.  Otherwise  we  will  end  up  with  fairly 
predictable  results.  Advertising  systems  build  things 
like  the  radio-,  television-,  magazine-t)'pe  publishing 
systenjsaad  they  f<nd  to  con^oiiwrate  imd  st>  ycm  end 
up  with  very  few  of  them  at  the  end  of  the  day. 


t  aiiiliou.:  . 
idea,  so  I  just 


b;-i»  or 


V»^!^  -ictted  SI: 
J  ide;.' .  of  the 


1.  Ho^ 
-prof:> 


Well,  I  ha\t  never  had  more  than  oi 
keep  at  it,  which  is  to  build  a  library 

Here  in  Silicon  Valley,  many  people  think  thai  cor- 
porations are  the  answer  to  all  problems  and  I  think 
#Ht  the  library  reail)  beiongs  tii  a  non-profit. 
Corporatifiiis  are  very  good  at  exploiting  ideas  or 
assets.  That  is  what  they  do.  But  lilnaries  are  funda- 
mentally different  and  wc  want  it  to  be  open.  We  want 
it  to  be  untler  the  rule  of  law  not  under  the  rule  of  a 
corporate  structure. 

Google  is  t:  ring  to  buiJd  a  corporafi:  librai-y.  Wh.it  is 
your  posit'-i-n  on  ihcir  copyright  coc:cerns? 
Copyright  is  just  an  incMnation  of  a  set  of  rules  of 
how  businesses  work  and  it  has  changed  over  time. 
Ben  Franklin's  copyright  was  14  years,  renewable  once. 
And  derivatives  were  nol  copyrighted.  That  was  how 
this  country  was  founded,  but  it  has  gone  bizariely 
wrong  in  the  United  States.  1  think  in  large  p^t 
because  of  the  influence  that  corporations  have  had  in 
the  making  of  law.  It  has  caused  a  fantastic  explosion 
of  government  regulation.  What  copyright  has  seen  in 
the  last  30  years  is  a  tragic  mistake. 


You  want  to  craft  laws  to  support  lots  of  creativity,  lots  of  innovation,  lots  of  eco- 
nomic expansion  and  our  current  copyright  laws  are  disastrous  in  these  regards. 
VVi--.n  did  copy:  ighE  Stan  g.  ^ig  ^/roiig? 

ThL-  first  incredible  mistake  was  in  1976.  When  I  was  growing  up,  you  had  to  put  a 
little  "c"  in  a  circle  to  get  copyright  protection.  In  fact,  you  also  had  to  send  a  copy  to 
the  Library  of  Congress,  otherwise  you  did  not  get  protection.  This  seems  to  make 
sense.  In  76  they  made  it  so  that  ever>thing  was  copyrighted.  My  second  grader's 
scriblings  are  copyrighted  for  170  years.  This  is  nuts.  In  1998.  the  Untied  Strttes 
passed  this  wonderful  piece  of  legislation  called  the  Digital  Millennium  Copyright 
Act,  which,  some  people  say,  has  the  effect  of  redefining  reading  as  copying.  So  that 
thf  tier  of  looking  at  something  m  die  digital  \-mM  is  suddenly  a  copj.  it  is  OrwelKan. 
I  have  had  lawyers  look  at  me  in  the  face  and  say,  "Reading  is  copying." 
Wi'.rtt  is  the  be.'-i  fix  for  cop;.  -.ight? 

I  think  Ben  Franklin  was  smart.  Maybe  these  corporate  lobbyists  are  smarter  than 
Ben  Franklin  and  Tkomai  Jttferson,  but  1  ka^'e  my  doabts.  He  was  a  prmtEr  and  be 
was  out  to  make  sure  that  people  could  publish  and  print.  If  we  wore  to  go  back  to 
founders"  copyright,  I  think,  we  would  have  more  business  and  innovation. 

vVniii  is  the  Op.-i  Library  A 'iiancerelat-  ;  to  the  In!.;;net  Ai-chiv_i 

The  Open  Content  Alliance  is  a  group  of  institutions  that  are  working  together  to 

build  joint  collections,  out  in  the  open.  It  is  a  project  of  the  Internet  Archive.  The 

Internet  Archive  facilitates  the  alliance.  So  Microsoft  and  Yahoo!,  HP,  Adobe,  plus  HP 

Adobe,  about  30  libraries  [six  Canadian]  at  this  point  are  all  working  to  build  joint 

collections. 

VVl)  e  compani(.-.-.  like  YaTioo-  and  Mlcros^U  eager  to  join  to  thwar-  Google? 
Absolutely.  The  timing  and  the  discussions  have  been  galvanized  by  Google's  bold 
stance  to  digitize  -  and  keep  proprietary  -  several  great  libraries  of  the  world. 

">V;j:)i-  do  yo'j  n:-:.-?ji;  keep  pr-.vurietai-y? 

Mcjst  of  this  is  shrouded  in  secrecy  and  lawsuits,  but  tiicre  is  a  contract  that  is  issued 
oul  of  the  University  of  Michigan  and  put  up  on  the  Web,  so  )ou  .  an  see  what  the 
re.slrictions  are.  As  1  understand  it,  it  is  oii-campus  use  lor  Michigan  but  you  cannot 
download  the  materials  for  nff-campus  use. 


We  certainly  can  if  we  work  together.  Take  the  library  system  in  the  United  States, 
[which  is  funded  with]  $12  billion  ayeai.  About  one  third  of  that  money  goes  In  buy- 
ing books,  so  $3  or  $4  billion  goes  to  publishers.  If  we  were  to  go  and  take  that  $12 
billion  and  ipaid  it  diffra:t'ndy.  some  of  that  memey  wotdd  still  fjo  to  piiiJisiitas  and 
there  would  be  new  electronic  services.  Even  if  you  wanted  to  go  out  and  scan  a  mil- 
lion books,  it  costs  about  $30  million  so  it  is  not  that  big  of  a  number  [much  cheap- 
er in  developing  worlds]. 

.Vltsv  es^  &£  in-t2rEet..vvd2ivo  d'^  ■■.H&  aas^'^^ldrm  and  t..«5,  aasrasri::.-  ' 
h'silt  largely  i;n  static  PT'T  pages?  Do  vnu  need  to  c-sangc  the  i-ifrnstructuri; 
It  is  evolving  rapidly  in  terms  of  how  to  make  books  useful  on  the  internet.  I  would 
say  that  we  do  not  have  very  many  examples  yet.  Amazon  has  shown,  which  is  one  of 
the  largest  book  scanning  organizations,  it  helps  promote  the  sale  of  books.  The  idea 

\'.1iy  is  it  imi'ortani  So  diiataiiy  nrchi>  s  books? 

It  is  really  how  people  for  the  last  6,000  years,  I  guess  since  Sumerian  tablets,  passed 
knowledge  on  from  one  generation  to  the  next.  In  this  digital  world,  we  are  finding 
more  and  mtfre  people  ;nst  use  what  is  on  the  net.  If  it  is  not  on  the  net,  it  is  as  if  it 
does  not  exist.  And  a  lot  of  the  treasures  that  humans  have  to  offer  their  next  gener- 
ations are  not  on  the  net  yet.  As  libraries,  we  are  really  duty  bound  to  bring  the  best 
that  we  have  to  offer  within  reach  of  our  children. 


As  more  and  more  material  (is  published]  on  the  net  it  is  also  less  expensive  tn  stoie 
it.  In  1996.  we  stored  materials  on  tape  kept  off-line  and  now  we  are  able  to  keep 
them  online  and  spinning  and  the  costs  just  keeps  dropping.  You  also  need  a  rising 
budget  We  tuund^  that  liy  bemg  fru^,  we  can  operste  the  l^eriKl  Archwe  at 
between  $5  and  $10  million  per  year. 

The  real  threat  to  libraries  over  time  is  that  they  are  burned.  Sometimes  they  are 
actually  burned  with  matches  and  sometimes  they  are  made  irrelevant  based  on 
changes  in  law  or  policy  or  in  how  people  live.  1  will  do  everything  that  I  can  to  show 
why  fliese  laws  make  no  sense  to  the  long-term  intellectual  history  of  our  species. 

^'o\^■  will  )'0'i  do  this? 

Over  the  next  10  or  20  years,  1  hope  there  will  be  a  handful  of  libraries  centred  in  the 
great  cultures  of  the  world  that  would  actively  collect  the  materials  from  their  areas. 
jThey  would)  provide  access  to  it  and  also  make  copies  m  the  other  archives  of  the 
world,  therefore,  providing  long-term  preservation.  Then  you  can  contribute  and 
add  your  new  ideas  back  into  the  library.  That,  I  believe,  is  the  opportunity  of  our 
generation  and  we  have  the  political  will  by  being  in  an  open  society  where  universal 
education  is  cherishecL  I  vjant  to  spread  universal  access  to  all  hiunan  knowledge.   ^ 


Seminar  Series 


^^mmmm^^^mmm^ 


Although  it  Is  generally  accepted  as  fact  that  printers  are  scrambling  for  marl<et  share,  new 
opportunities  for  growth  are  still  wide  open.  Most  printers  have  the  raw  materials  in  their 
hands  today  for  producing  an  engaging  communications  product  using  existing  equipment 
and  vviiii  only  a  minor  tnyesimera  trv  changei  id  existing  p^eutsss  worirflow  saftwafe. 

In  this  3-hQur  worl<shop  you  will  learn  how  to  create  Rich  Media  POFs  from  llie  digital  con- 
tent already  coming  in  to  your  shop.  Consider  the  possibility  of  new  revenue  streams  using 
your  core  competencies. 
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As  Online  Libraries  Are  Formed^ 
Issues  of  Control,  Privacy  Are  Posed 


i 


Google  Inc.  made  news  last  week 
when  it  said  it  was  launching  a  ser- 
vice that  would  allow  users  to  search 
newspaper  archives  going  back  as  far 
as  the  18th  centm-y.  Announcements 
like  that  are  usually  applauded  as  an 
advance  for  the  spread  of  knowledge. 
But  Brewster  Kahle.  a  long-time  Inter- 
net activist  and  founder  of  Internet 
Archive,  had  some  reservations.  We 
asked  him  why. 

WJiat's  not  to  like  about  Google  mak- 
ing so  much  information  freely  avail- 
able? 

The  opportunity  for  universal  access 
to  all  public  knowledge  is  one  of  the 
great  opportunities  of  our  times.  And 
to  the  extent  that  companies  are  help- 
ing us  get  there,  that's  terrific.  Goo- 
gle is  making  great  strides  in  this  di- 
rection; the  basic  goal  is  terrific  and 
their  service  is  actually  quite  good. 

The  issue  we  have  with  what's  being 
built  is  that  we  are  creating  what  is  in 
effect  a  private  library  system.  What 
we  want,  however,  is  a  public  library 
system,  one  where  we  can  have  many 


different  points  of  view  on  the  pub- 
lished literature  of  humankind.  What 
we  are  actually  building  might  end  up 
being  controlled  by  a  single  corpora- 
tion. If  this  were  some  other  industry- 
plastic  or  software-I  wouldn't  be  as 
worried  about  it.  But  we  are  talking 
about  the  cultural  heritage,  the  intel- 
lectual heritage,  of  humans.  And 
that's  too  important  to  be  left  to  one 
company. 

Wfiat  exactly  might  the  downside  of 
this  be? 

One  of  the  issues  involves  privacy.  We 
are  building  a  library;  one  way  of  look- 
ing at  the  Internet  is  as  a  giant  li- 
brary. But  when  you  are  tracking  ev- 
ery book  that  has  been  checked  out  of 
this  library,  and  who  checks  it  out  and 
what  else  they  might  be  doing,  that 
raises  some  very  problematic  issues 
with  regard  to  privacy. 

You  work  with  the  Open  Content  Alli- 
ance, which  is  funded  in  part  by  Ya- 
hoo. Are  you  just  carrying  Yahoo's 
ivater  in  its  fight  with  Google? 


I  work  for  a  nonprofit  organization. 
We  get  our  money  from  the  Library 
of  Congress  and  the  Sloan  Founda- 
tion. Also,  to  help  pay  for  the  digitiza- 
tion of  books,  we  get.  money  from  Ya- 
hoo and  Microsoft. 

What  exactly  is  the  Internet  Archive? 

There  are  several  parts  to  it.  The  Mil- 
lion Book  Project  intends  to  scan  one 
million  books  from  all  over  the  world. 
Right  now,  we  have  500,000.  Another 
part.  Project  Gutenberg,  has  already 
scanned  10,000  books  that  are  in  the 
public  domain.  These  are  the  world's 
greatest  hits.  And  we've  scanned 
them  in  such  a  way  that  lets  you 
.  search  the  text  of  the  .book,  but  also 
see  the  beauty  of  the  original  printed 
page. 

Wliere  do  people  find  all  this? 

Go  to  archive.org,  click  on  "Texts"  at 
the  top,  then  go  perhaps  to  "American 
Libraries"  and  click  on  "See  recent  ad- 
ditions." Look  at  any  book  on  that 
list;  you  can  download  the  entire  PDF 
for  the  book. 

They  look  very  nice.  What  do  ijou  use 
for  scanners? 

Instead  of  traditional  scanners,  we 
use  what  in  effect  are  high-end  digital 
cameras., We  have  75  people  scanning 
in  libraries  all  over  the  world. 
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As  Online  Libraries  Are  Formed^ 
Issues  of  Control,  Privacy  Are  Posed 


Google  Inc.  made  news  last  week 
when  It  said  it  was  launching  a  ser- 
vice that  would  allow  users  to  search 
newspaper  archives  going  back  as  far 
as  the  18th  centui-y.  Announcements 
like  that  are  usually  applauded  as  an 
advance  for  the  spread  of  knowledge. 
But  Brewster  Kahle,  a  long-time  Inter- 
net activist  and  founder  of  Internet 
Archive,  had  some  reservations.  We 
asked  him  why. 

*      *      * 
Wliat's  not  to  like  about  Google  mak- 
ing so  much  information  freely  avail- 
able? 

The  opportunity  for  universal  access 
to  all  public  knowledge  is  one  of  the 
great  opportunities  of  our  tipies.  And 
to  the  extent  that  companies  are  help- 
ing us  get  there,  that's  terrific.  Goo- 
gle is  making  great  strides  in  this  di- 
rection; the  basic  goal  is  terrific  and 
their  service  is  actually  quite  good. 

The  issue  we  have  with  what's  being 
built  is  that  we  are  creating  what  is  in 
effect  a  pi-ivate  library  system.  What 
we  want,  however,  is  a  public  library 
system,  one  where  we  can  have  many 


different  points  of  view  on  the  pub- 
lished literature  of  humankind.  What 
we  are  actually  building  might  end  up 
being  controlled  by  a  single  corpora- 
tion. If  this  were  some  other  industry- 
plastic  or  software-I  wouldn't  be  as 
worried  about  it.  But  we  are  talking 
about  the  cultural  heritage,  the  intel- 
lectual heritage,  of  humans.  And 
that's  too  important  to  be  left  to  one 
company. 

What  exactly  might  the  downside  of 
this  be? 

One  of  the  issues  involves  privacy.  We 
are  building  a  libi-ary;  one  way  of  look- 
ing at  the  Internet  is  as  a  giant  li- 
brary. But  when  you  are  tracking  ev- 
ery book  that  has  been  checked  out  of 
this  library,  and  who  checks  it  out  and 
what  else  they  might  be  doing,  that 
raises  some  very  problematic  issues 
with  regard  to  privacy. 

You  icork  with  the  Open  Content  Alli- 
ance, which  is  funded  in  part  by  Ya- 
hoo. Are  you  just  carrying  Yahoo's 
water  in  its  fight  with  Google? 


1  work  for  a  nonprofit  organization. 
We  get  our  money  from  the  Library 
of  Congress  and  the  Sloan  Founda- 
tion. Also,  to  help  pay  for  the  digitiza- 
tion of  books,  we  get.  money  from  Ya- 
hoo and  Microsoft. 

What  exactly  is  the  Internet  Archive? 

There  are  several  parts  to  it.  The  Mil- 
lion Book  Project  intends  to  scan  one 
million  books  from  all  over  the  world. 
Right  now,  we  have  500,000.  Another 
part.  Project  Gutenberg,  has  already 
scanned  10,000  books  that  are  in  the 
public  domain.  These  are  the  world's 
greatest  hits.  And  we've  scanned 
them  in  such  a  way  that  lets  you 
search  the  text  of  the  book,  but  also 
see  the  beauty  of  the  original  printed 
page. 

Where  do  people  find  all  this? 

Go  to  archive.org,  click  on  "Texts"  at 
the  top,  then  go  perhaps  to  "American 
Libraries"  and  click  on  "See  recent  ad- 
ditions." Look  at  any  book  on  that 
list;  you  can  download  the  entire  PDF 
for  the  book. 

They  look  very  nice.  What  do  ijou  use 
for  scanners? 

Instead  of  traditional  scanners,  we 
use  what  in  effect  are  high-end  digital 
cameras.  We  have  75  people  scanning 
in  libraries  all  over  the  world. 
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As  Online  Libraries  Are  Formed^ 
Issues  of  Control,  Privacy  Are  Posed 


Goog-le  Inc.  made  news  last  week 
when  it  said  it  was  launching  a  ser- 
vice that  would  allow  users  to  search 
newspaper  archives  going  back  as  far 
as  the  18th  centui-y.  Announcements 
like  that  are  usually  applauded  as  an 
advance  for  the  spread  of  knowledge. 
But  Brewster  Kahle,  a  long-time  Inter- 
net activist  and  founder  of  Internet 
Archive,  had  some  reservations.  We 
asked  him  why. 

*      *      * 
Wliat's  not  to  like  about  Google  mak- 
ing so  much  information  freely  avail- 
able? 

The  opportunity  for  universal  access 
to  all  public  knowledge  is  one  of  the 
great  opportunities  of  our  times.  And 
to  the  extent  that  companies  are  help- 
ing us  get  there,  that's  terrific.  Goo- 
gle is  making  great  strides  in  this  di- 
rection; the  basic  goal  is  terrific  and 
their  service  is  actually  quite  good. 

The  issue  we  have  with  what's  being 
built  is  that  we  are  creating  what  is  in 
effect  a  private  library  system.  What 
we  want,  however,  is  a  public  library 
system,  one  where  we  can  have  many 


different  points  of  view  on  the  pub- 
lished literature  of  humankind.  What 
we  are  actually  building  might  end  up 
being  controlled  by  a  single  corpora- 
tion. If  this  were  some  other  industry - 
plastic  or  software- 1  wouldn't  be  as 
worried  about  it.  But  we  are  talking 
about  the  cultural  heritage,  the  intel- 
lectual heritage,  of  humans.  And 
that's  too  important  to  be  left  to  one 
company. 

What  exactly  might  the  downside  of 
this  be? 

One  of  the  issues  involves  privacy.  We 
are  building  a  library;  one  way  of  look- 
ing at  the  Internet  is  as  a  giant  li- 
brary. But  when  you  are  tracking  ev- 
ery book  that  has  been  checked  out  of 
this  library,  and  who  checks  it  out  and 
what  else  they  might  be  doing,  that 
raises  some  very  problematic  issues 
with  regard  to  privacy. 

Ybu  work  with  the  Open  Content  Alli- 
ance, which  is  funded  in  part  by  Ya- 
hoo. Are  you  just  caii'ying  Yahoo's 
water  in  its  fight  with  Google? 


I  work  for  a  nonprofit  organization. 
We  get  our  money  from  the  Library 
of  Congress  and  the  Sloan  Founda- 
tion. Also,  to  help  pay  for  the  digitiza- 
tion of  books,  we  get.  money  from  Ya- 
hoo and  Microsoft. 

What  exactly  is  the  Internet  Archive? 

There  are  several  parts  to  it.  The  Mil- 
lion Book  Project  intends  to  scan  one 
million  books  from  all  over  the  world. 
Right  now,  we  have  500,000.  Another 
part.  Project  Gutenberg,  has  already 
scanned  10,000  books  that  are  in  the 
public  domain.  These  are  the  world's 
greatest  hits.  And  we've  scanned 
them  in  such  a  way  that  lets  you 
search  the  text  of  the  book,  but  also 
see  the  beauty  of  the  original  printed 
page. 

Where  do  people  find  all  this? 

Go  to  archive.org,  click  on  "Texts"  at 
the  top,  then  go  perhaps  to  "American 
Libraries"  and  click  on  "See  recent  ad- 
ditions." Look  at  any  book  on  that 
list;  you  can  download  the  entire  PDF 
for  the  book. 

They  look  very  nice.  What  do  you  use 
for  scanners? 

Instead  of  traditional  scanners,  we 
use  what  in  effect  are  high-end  digital 
cameras.  We  have  75  people  scanning 
in  libraries  all  over  the  world. 
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As  Online  Libraries  Are  Formed, 
Issues  of  Control,  Privacy  Are  Posed 


Google  Inc.  made  news  last  week 
when  it  said  it  was  launching  a  ser- 
vice that  would  allow  users  to  search 
newspaper  archives  going  back  as  far 
as  the  18th  century.  Announcements 
like  that  are  usually  applauded  as  an 
advance  for  the  spread  of  knowledge. 
But  Brewster  Kahle,  a  long-time  Inter- 
net activist  and  founder  of  Internet 
Archive,  had  some  reservations.  We 
asked  him  why. 

•K-        *         * 

Wliat's  not  to  like  about  Google  mak- 
ing so  much  information  freely  avail- 
able? 

The  opportunity  for  universal  access    ' 
to  all  public  knowledge  is  one  of  the 
great  opportunities  of  our  times.  And 
to  the  extent  that  companies  are  help- 
ing us  get  there,  that's  terrific.  Goo- 
gle is  making  great  strides  in  this  di- 
rection; the  basic  goal  is  terrific  and 
their  service  is  actually  quite  good. , 

The  issue  we  have  with  what's  being 
built  is  that  we  are  creating  what  is  in 
effect  a  private  library  system.  What 
we  want,  however,  is  a  public  library 
system,  one  where  we  can  have  many 


different  points  of  view  on  the  pub- 
lished literature  of  humankind.  What 
we  are  actually  building  might  end  up 
being  controlled  by  a  single  corpora- 
tion. If  this  were  some  other  industry- 
plastic  or  software-I  wouldn't  be  as 
worried  about  it.  But  we  are  talking 
about  the  cultural  heritage,  the  intel- 
lectual heritage,  of  humans.  And 
that's  too  important  to  be  left  to  one 
company. 

Wha,t  exactly  might  the  downside  of 
this  be? 

One  of  the  issues  involves  privacy.  We 
are  building  a  library;  one  way  of  look- 
ing at  the  Internet  is  as  a  giant  li- 
brary. But  when  you  are  tracking  ev- 
ery book  that  has  been  checked  out  of 
this  library,  and  who  checks  it  out  and 
what  else  they  might  be  doing,  that 
raises  some  very  problematic  issues 
with  regard  to  privacy. 

You  ivork  with  the  Open  Content  Alli- 
ance, which  is  funded  in  part  by  Ya- 
hoo. Are  you  just  carnjing  Yahoo's 
ivater  in  its  fight  with  Google? 


I  work  for  a  nonprofit  organization. 
We  get  our  money  from  the  Library 
of  Congress  and  the  Sloan  Founda- 
tion. Also,  to  help  pay  for  the  digitiza- 
tion of  books,  we  get.  money  from  Ya- 
hoo and  Microsoft. 

What  exactly  is  the  Internet  Archive? 

There  are  several  parts  to  it.  The  Mil- 
lion Book  Project  intends  to  scan  one 
million  books  from  all  over  the  world. 
Right  now,  we  have  500,000.  Another 
part.  Project  Gutenberg,  has  already 
scanned  10,000  books  that  are  in  the 
public  domain.  These  are  the  world's 
greatest  hits.  And  we've  scanned 
them  in  such  a  way  that  lets  you 
search  the  text  of  the  book,  but  also 
see  the  beauty  of  the  original  printed 
page. 

Where  do  people  find  all  this? 

Go  to  archive.org,  click  on  "Texts"  at 
the  top,  then  go  perhaps  to  "American 
Libraries"  and  click  on  "See  recent  ad- 
ditions." Look  at  any  book  on  that 
list;  you  can  download  the  entire  PDF 
for  the  book. 

They  look  very  nice.  What  do  you  use 
for  scanners? 

Instead  of  traditional  scanners,  we 
use  what  in  effect  are  high-end  digital 
cameras.  We  have  75  people  scanning 
in  libraries  all  over  the  world. 
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As  Online  Libraries  Are  Formed, 
Issues  of  Control,  Privacy  Are  Posed 


Google  Inc.  made  news  last  week 
when  it  said  it  was  launching  a  ser- 
vice that  would  allow  users  to  search 
newspaper  archives  going  back  as  far 
as  the  18th  centui-y.  Announcements 
like  that  are  usually  applauded  as  an 
advance  for  the  spread  of  knowledge. 
But  Brewster  Kahle,  a  long-time  Inter- 
net activist  and  founder  of  Internet 
Archive,  had  some  reservations.  We 
asked  him  why. 

*      *■      * 
Wliat's  not  to  like  about  Google  mak- 
ing so  much  information  freely  avail- 
able? 

The  opportunity  for  universal  access 
to  all  public  knowledge  is  one  of  the 
great  opportunities  of  our  times.  And 
to  the  extent  that  companies  are  help- 
ing us  get  there,  that's  terrific.  Goo- 
gle is  making  great  strides  in  this  di- 
rection; the  basic  goal  is  terrific  and 
their  service  is  actually  quite  good. . 

The  issue  we  have  with  what's  being 
built  is  that  we  are  creating  what  is  in 
effect  a  private  library  system.  What 
we  want,  however,  is  a  public  library 
system,  one  where  we  can  have  many 


different  points  of  view  on  the  pub- 
lished literature  of  humankind.  What 
we  are  actually  building  might  end  up 
being  controlled  by  a  single  corpora- 
tion. If  this  were  some  other  industry- 
plastic  or  software-I  wouldn't  be  as 
worried  about  it.  But  we  are  talking 
about  the  cultural  heritage,  the  intel- 
lectual heritage,  of  humans.  And 
that's  too  important  to  be  left  to  one 
company. 

What  exactly  might  the  downside  of 
this  be? 

One  of  the  issues  involves  privacy.  We 
are  building  a  libi'ary;  one  way  of  look- 
ing at  the  Internet  is  as  a  giant  li- 
brary. But  when  you  are  tracking  ev- 
ery book  that  has  been  checked  out  of 
this  library,  and  who  checks  it  out  and 
what  else  they  might  be  doing,  that 
raises  some  very  problematic  issues 
with  regard  to  privacy. 

Y'bu  ivork  with  the  Open  Content  Alli- 
ance, which  is  funded  in  part  by  Ya- 
hoo. Are  you  just  carrying  Yahoo's 
water  in  its  fight  with  Google? 


I  work  for  a  nonprofit  organization. 
We  get  our  money  from  the  Library 
of  Congress  and  the  Sloan  Founda- 
tion. Also,  to  help  pay  for  the  digitiza- 
tion of  books,  we  get  money  from  Ya- 
hoo and  Microsoft. 

What  exactly  is  the  Internet  Archive? 

There  are  several  parts  to  it.  The  Mil- 
lion Book  Project  intends  to  scan  one 
million  books  from  all  over  the  world. 
Right  now,  we  have  500,000.  Another 
part.  Project  Gutenberg,  has  already 
scanned  10,000  books  that  are  in  the 
public  domain.  These  are  the  world's 
greatest  hits.  And  we've  scanned 
them  in  such  a  way  that  lets  you 
search  the  text  of  the  book,  but  also 
see  the  beauty  of  the  original  printed 
page. 

Where  do  people  find  all  this? 

Go  to  archive.org,  click  on  "Texts"  at 
the  top,  then  go  perhaps  to  "American 
Libraries"  and  click  on  "See  recent  ad- 
ditions." Look  at  any  book  on  that 
list;  you  can  download  the  entire  PDF 
for  the  book. 

They  look  vei-y  nice.  What  do  you  use 
for  scanners? 

Instead  of  traditional  scanners,  we 
use  what  in  effect  are  high-end  digital 
cameras.  We  have  75  people  scanning 
in  libraries  all  over  the  world. 
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Tech  awards  in  San  Jose  turn  spotlight  on  innovators 

Four  from  Bay  Area  among  25  laureates  to  be  honored 

Dan  Fost,  Chronicle  Staff  Writer 

Thursday,  September  21,  2006 

Technology's  ability  to  change  the  world,  or  at  least  bring  about  some  improvement,  is  the  focus  of  25 
awards  announced  Wednesday  by  San  Jose's  Tech  Museum  of  Innovation. 

The  fifth  annual  Tech  Museum  Award  laureates  are  people  and  organizations  from  all  over  the  world, 
including  some  in  San  Francisco  and  Silicon  Valley  and  some  as  far  away  as  Nigeria,  Japan,  India,  England 
and  Brazil. 

In  addition,  Microsoft  Chairman  Bill  Gates  will  be  honored  with  the  museum's  Global  Humanitarian  Award  for 
the  work  the  billionaire  tech  titan's  foundation  has  done  for  health  and  education  around  the  world. 

The  idea  of  the  award  is  "to  extend  the  idea  of  Silicon  Valley  throughout  the  world,"  said  Peter  Friess, 
president  of  the  Tech  Museum  of  Innovation.  "A  lot  of  winners  come  from  developing  countries.  You  can  see 
that  small  ideas,  put  into  the  right  circumstances,  can  help  a  lot." 

The  projects  are  a  varied  lot,  from  an  Eritrean  minister's  improved  stoves  project,  a  Canadian  outfit's  effort 
to  extract  drinking  water  from  fog  in  arid  regions,  a  refrigerator  powered  by  evaporative  energy  for  use  in 
the  Nigerian  desert,  a  Brazilian  initiative  to  monitor  and  map  mosquito  populations  to  reduce  Dengue  fever, 
and  a  British  effort  to  outfit  and  supply  motorcycles  to  African  health  care  workers. 

The  awards  can  help.  "We  had  a  laureate  last  year  who  converted  hay  into  gold,"  Friess  said.  The  Canadian 
project  actually  looked  at  reusing  hay  as  insulation  in  home-building  in  Bulgaria. 

"He  was  very  passionate  about  the  idea.  He  wanted  to  help  the  farmers  come  up  with  a  new  way  to  make 
money,  instead  of  selling  the  hay  for  its  normal  use,"  Friess  said. 

"Once  he  received  the  Tech  award,  building  companies  for  houses  in  North  America  knocked  on  his  door. 

"I  wouldn't  say  the  tech  award  makes  people  rich,  but  it  helps  them  find  new  business  models." 

One  award  winner  who  welcomes  the  attention  is  Brewster  Kahle,  founder  and  digital  librarian  at  the 
Internet  Archive  in  the  Presidio  of  San  Francisco. 

"Standing  in  our  way,"  Kahle  said,  "is  not  technology,  but  is  just  getting  the  public  perception  that  this  goal 
is  within  our  grasp." 


The  archive  is  partnering  with  libraries  and  digitizing  about  500  bool<s  a  day,  Kahle  said. 

The  Tech  awards  will  be  presented  at  a  gala  dinner  at  the  museum  Nov.  15,  hosted  by  former  San  Francisco 
49er  Steve  Young,  himself  a  tech  entrepreneur  and  member  of  the  museum's  board  of  directors. 

One  winner  in  each  of  five  categories  will  receive  a  $50,000  cash  prize,  to  be  announced  at  the  dinner. 

The  awards  are  funded  by  some  of  the  tech  world's  most  prominent  companies,  led  by  Applied  Materials 
Inc.,  Intel  Corp.,  Microsoft  Corp.,  Accenture  and  Agilent  Technologies  Inc.  Santa  Clara  University's  Center  for 
Science,  Technology  and  Society  chose  the  winners  from  951  entries. 
Bay  Area  winners 

MBA  Polymers,  Richmond:  An  environmental  award  for  developing  an  advanced,  energy-efficient  plastics 
recycling  process. 

Global  Connection  Project  Team,  based  in  Mountain  View  and  comprising  members  from  Google  Inc.,  NASA 
Ames  Research  Center  and  Carnegie  Mellon  University  in  Pittsburgh:  An  environmental  award  for  developing 
software  tools  for  use  with  Google  Earth  to  help  disaster  responders  get  accurate  and  timely  information 
during  recovery  efforts. 

Internet  Archive,  San  Francisco:  An  education  award  for  building  an  Internet  library  to  offer  permanent 
access  for  researchers,  historians  and  scholars  to  historical  collections  in  a  digital  format  (www.archive.org). 

Dominic  Massaro,  UC  Santa  Cruz:  An  education  award  for  developing  Baldi,  a  computer-animated  language 
tutor  for  people  with  autism  or  hearing  difficulties. 

A  full  list  of  winners  is  online  at  www.techawards.org. 

E-mail  Dan  Fost  at  dfost@sfchronicle.com. 
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Technology's  ability  to  change  the 
world,  or  at  least  bring  about  some 
improvement,  is  the  focus  of  25  award ^ 
announced  Wednesday  by  San  Jose's 
Tech  Museum  of  Innovation. 


The  fifth  annual  Tech  Museum  Award  laureates  are  people  and  organizations  from  all  over 
the  worid,  including  some  in  San  Francisco  and  Silicon  Valley  and  some  as  far  away  as 
Nigeria,  Japan,  India,  England  and  Brazil. 

In  addition,  Microsoft  Chairman  Bill  Gates  will  be  honored  with  the  museum's  Global 
Humanitanan  Award  for  the  work  the  billionaire  tech  titan's  foundation  has  done  for  health 
and  education  around  the  world. 

The  idea  of  the  award  is  "to  extend  the  idea  of  Silicon  Valley  throughout  the  world,"  said 
Peter  Friess,  president  of  the  Tech  Museum  of  Innovation.  "A  lot  of  winners  come  from 
developing  countries.  You  can  see  that  small  ideas,  put  into  the  right  circumstances,  can 
help  a  lot." 

The  projects  are  a  varied  lot,  from  an  Eritrean  minister's  improved  stoves  project,  a 
Canadian  outfit's  effort  to  extract  drinking  water  from  fog  in  arid  regions,  a  refrigerator 
powered  by  evaporative  energy  for  use  in  the  Nigerian  desert,  a  Brazilian  initiative  to 
monitor  and  map  mosquito  populations  to  reduce  Dengue  fever,  and  a  British  effort  to 
outfit  and  supply  motorcycles  to  Afncan  health  care  workers. 

The  awards  can  help.  "We  had  a  laureate  last  year  who  converted  hay  into  gold,"  Friess 
said.  The  Canadian  project  actually  looked  at  reusing  hay  as  insulation  in  home-building  in 
Bulgaria. 

"He  was  very  passionate  about  the  idea.  He  wanted  to  help  the  farmers  come  up  with  a  new 
way  to  make  money,  instead  of  selling  the  hay  for  its  normal  use,"  Friess  said. 

"Once  he  received  the  Tech  award,  building  companies  for  houses  in  North  America 
knocked  on  his  door. 

"I  wouldn't  say  the  tech  award  makes  people  rich,  but  it  helps  them  find  new  business 
models." 

One  award  winner  who  welcomes  the  attention  is  Brewster  Kahle,  founder  and  digital 
librarian  at  the  Internet  Archive  in  the  Presidio  of  San  Francisco. 

"Standing  in  our  way,"  Kahle  said,  "is  not  technology,  but  is  just  getting  the  public 
perception  that  this  goal  is  within  our  grasp." 


http://www.sfgate.com/cgi-bin/article.cgi  ?file=/chronicle/archive/2006/09/21/BUGGNL9CUTl.DTL&type=:printable 
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The  archive  is  partnering  with  hbraries  and  digitizing  about  500  books  a  day,  Kahle  said. 

The  Tech  awards  will  be  presented  at  a  gala  dinner  at  the  museum  Nov.  15,  hosted  by 
former  San  Francisco  49er  Steve  Young,  himself  a  tech  entrepreneur  and  member  of  the 
museum's  board  of  directors. 

One  winner  in  each  of  five  categories  will  receive  a  $50,000  cash  prize,  to  be  announced  at 
the  dinner. 

The  awards  are  funded  by  some  of  the  tech  world's  most  prominent  companies,  led  by 
Applied  Materials  Inc.,  Intel  Corp.,  Microsoft  Corp.,  Accenture  and  Agilent  Technologies 
Inc.  Santa  Clara  University's  Center  for  Science,  Technology  and  Society  chose  the 
winners  from  951  entries. 


Bay  Area  winners 

MBA  Polymers,  Richmond:  An  environmental  award  for  developing  an  advanced, 
energy-efficient  plastics  recycling  process. 

Global  Connection  Project  Team,  based  in  Mountain  View  and  comprising  members  from 
Google  Inc.,  NASA  Ames  Research  Center  and  Carnegie  Mellon  University  in  Pittsburgh: 
An  environmental  award  for  developing  software  tools  for  use  with  Google  Earth  to  help 
disaster  responders  get  accurate  and  timely  information  during  recovery  efforts. 

Internet  Archive,  San  Francisco:  An  education  award  for  building  an  Internet  library  to 
offer  permanent  access  for  researchers,  historians  and  scholars  to  historical  collections  in  a 
digital  format  ( w  \^' w .  arc  hi  ve .  or  g) . 

Dominic  Massaro,  UC  Santa  Cruz:  An  education  award  for  developing  Baldi,  a 
computer-animated  language  tutor  for  people  with  autism  or  hearing  difficulties. 

A  full  list  of  winners  is  online  at  w w w . techawards . or g . 

E-mail  Dan  Post  at  dfost@sl'chronicle.com. 
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Archive-It  2:  Internet  Archive  Strives  to  Ensure  Preservation 
and  Accessibility 
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Preserving  seemingly  ephemeral  web  content  is  a  daunting  task.  The  problem  is  even  more  difficult  because  the 
content  of  web  pages  changes  and  the  pages  themselves  come  and  go  with  great  frequency,  which  means  simply 
collecting  URLs  isn't  enough  to  keep  tabs  on  valuable  content.  To  help  make  digital  content  preservation  possible, 
Internet  Archive,  a  San  Francisco-based  nonprofit  has  led  a  charge  to  effectively  capture  and  store  web  content. 

Since  its  inception  ten  years  ago,  Internet  Archive  has  focused  on  ensuring  the  availability  and  accessibility  of 
internet  content  by  creating  an  internet  library  to  permanently  store  digital  content  for  anyone  to  view  at  any  time. 
Beyond  the  content  it  has  chosen  to  preserve,  last  year  Internet  Archive  launched  a  service  called  Archive-It  to  help 
organizations  seeking  an  easier  Vvfay  to  archive  valuable  web  content.  The  project  recently  released  Archive-It  2  in  its 
continued  effort  to  archive  the  web. 

"It's  a  fallacy  that  if  something  is  on  the  web,  it  will  stay  there,"  says  Kristine  Hanna,  director  of  web  archiving 
services  for  Internet  Archive.  "It's  not  like  a  piece  of  paper  you  put  in  a  file  folder  and  it  will  be  there  forever.  There's 
an  urgent  need  for  people  to  understand  that  the  web  is  who  we  are.  It's  our  culture  and  our  social  fabric,  and  we 
don't  want  to  lose  any  of  it." 

At  present,  Internet  Archive's  complete  library  contains  65  billion  pages  of  web  content-- including  books,  moving 
images,  and  software  (about  40,000  in  each  category).  To  archive  material,  Internet  Archive  uses  a  web  crawler 
that  scans  the  entire  web  for  documents  created  during  a  specific  time  period.  The  documents  are  then  catalogued 
and  placed  on  the  organization's  servers.  The  content  is  stored  in  repositories  around  the  world— San  Francisco, 
Egypt,  Amsterdam,  and  France. 

In  mid-2005,  Internet  Archive  launched  the  beta  version  of  Archive- It,  a  web-based  subscription  service  to  help 
"memory  institutions"  create  and  archive  their  own  web  collections,  in  order  to  provide  two  main  benefits.  First, 
these  institutions  are  able  to  preserve  their  desired  web  content.  Second,  their  collections  are  available  for  viewing 
by  the  general  public  on  the  Internet  Archive  site,  enabling  the  nonprofit  to  build  its  offerings  simultaneously. 
Archive-It  1  officially  launched  in  January,  followed  by  1.5  in  May  and  the  most  recent  point  release,  2,  in  late  July. 

Hanna  says  Archive-It  was  designed  mainly  for  institutions  (state  archives,  state  libraries,  and  university  libraries) 
that  have  a  mandate  to  archive  their  web  content  and  that  lack  the  resources  (staff,  budget,  and  technical 
capabilities)  to  do  so.  "We  are  collaborating  with  institutions  to  save  material  that  normally  wouldn't  be,  that  we 
probably  wouldn't  save  on  our  own,  that  they  couldn't  save  on  their  own,"  Hanna  says.  "We're  joining  forces  to  make 
sure  that  all  of  this  knowledge  is  not  lost." 

To  begin  creating  a  collection,  subscribers  can  select  as  many  as  300  websites  associated  with  a  particular  topic  to 
be  crawled,  and  Archive-It  can  be  programmed  to  crawl  those  sites  as  often  as  desired  (from  daily  to  weekly  to 
quarterly).  Once  archived,  subscribers  can  subsequently  search  (either  by  text  or  URL)  the  archived  web 
pages— which  look  exactly  like  the  pages  when  they  were  captured  on  the  web.  Those  searches  can  be  conducted  on 
either  the  Internet  Archive  or  Archive-It  sites.  Users  can  search  by  a  variety  of  criteria,  including  subject,  date, 
relevance,  institution,  and  collection.  Advanced  search  options  include  the  ability  to  search  between  dates. 

Version  2  of  Archive-It  offers  several  new  features  not  available  in  previous  editions.  Subscribers  can  now  conduct 
test  crawls,  which  enable  them  to  see  the  type  of  web  material  that  would  populate  a  specific  collection  before  it  is 


http:  //www. econtentmag.com/ Articles/ ArticlePrint.aspx?ArticleID=  18132 


EContentmag.com  2  of  2 


archived  permanently.  There  is  also  a  metadata  search  capability,  which  allows  metadata  to  be  included  in  the  text 
searches  of  materials  in  a  collection.  Archive-It  Pro  enables  subscribers  to  set  caps  on  how  many  web  documents 
are  collected  from  a  website.  It  can  also  block  the  collection  of  materials  from  desired  websites. 

The  collections  created  by  subscribers  cover  a  wide  range  of  subject  matter.  The  North  Carolina  state  government 
has  created  a  collection  of  web  pages  from  various  state  boards,  commissions,  and  agencies.  Indiana  University,  for 
example,  wanted  to  archive  all  of  the  university's  web  pages.  "They're  not  sure  what's  going  to  be  of  value  later," 
says  Hanna.  "They  are  able  to  capture  everything  now."  Subscribers  can  view  and  download  reports  regarding  the 
status  of  the  crawls.  Archive-It  is  available  by  annual  subscription— the  most  popular  package  is  priced  at  $10,000 
and  allows  subscribers  to  build  three  collections  with  up  to  10  million  URLs. 

Internet  Archive  has  plans  to  expand  the  reach  of  the  Archive-It  service  by  targeting  smaller  entities,  such  as 
Independent  researchers,  local  libraries,  and  small  non-governmental  organizations,  with  a  lower-priced  version.  .As 
Hanna  says,  "The  Internet  Archive's  universal  approach  to  the  dissemination  and  access  of  information  is  embodied 
in  its  Archive-It  service  that  anybody  or  any  organization  can  use." 

{www.archlve.org;  wvvw. archive-it. erg) 
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Internet  Archive  Strives  to  Ensure  Preservation  and  Accessibility 


Preserving  seemingly  ephemeral  web  content  is  a  daunting 
task.  The  problem  is  even  more  difficult  because  the  content 
of  web  pages  changes  and  the  pages  themselves  come  and  go 
with  great  frequency,  which  means  simply  collecting  URLs  isn't 
enough  to  keep  tabs  on  valuable  content.  To  help  make  digital 
content  preservation  possible,  hitemet  Archive,  a  San  Francisco- 
based  nonprofit  has  led  a  charge  to  effectively  capture  and  store 
web  content. 

Since  its  inception  ten  years  ago,  Internet  Archive  has 
focused  on  ensuring  the  availabihty  and  accessibility  of  internet 
content  by  creating  an  internet  library  to  permanently  store 
digital  content  for  anyone  to  view  at  any  time.  Beyond  the  con- 
tent it  has  chosen  to  preserve,  last  year  Internet  Archive 
launched  a  service  called  Archive-It  to  help  organizations  seek- 
ing an  easier  way  to  archive  valuable  web  content.  The  project 
recently  released  Archive-It  2  in  its  continued  effort  to  archive 
the  web. 
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"It's  a  fallacy  that  if  something  is  on  the  web,  it  will  stay 
there,"  says  Kristine  Hanna,  director  of  web  archiving  services 
for  Internet  Archive.  "It's  not  like  a  piece  of  paper  you  put  in  a 
file  folder  and  it  wiO  be  there  forever.  There's  an  urgent  need  for 
people  to  understand  that  the  web  is  who  we  are.  It's  our  culture 
and  our  social  fabric,  and  we  don't  want  to  lose  any  of  it." 

At  present,  Internet  Archive's  complete  library  contains  65 
billion  pages  of  web  content — including  books,  moving  images, 
and  software  (about  40,000  in  each  category).  To  archive 
material,  Internet  Archive  uses  a  web  crawler  that  scans  the 
entire  web  for  documents  created  during  a  specific  time  period. 
The  documents  are  then  catalogued  and  placed  on  the  organiza- 
tion's servers.  The  content  is  stored  in  repositories  around  the 
world — San  Francisco,  Egypt,  Amsterdam,  and  France. 

In  mid-2005,  Internet  Archive  launched  the  beta  version  of 
Archive-It,  a  web-based  subscription  service  to  help  "memory 
institutions"  create  and  archive  their  own  web  collections,  in 
order  to  provide  two  main  benefits.  First,  these  institutions  are 
able  to  preserve  their  desired  web  content.  Second,  their  collec- 
tions are  available  for  viewing  by  the  general  public  on  the 
hitemet  Archive  site,  enabling  the  nonprofit  to  build  its  offerings 
simultaneously.  Archive-It  1  officially  launched  in  January,  fol- 
lowed by  1.5  in  May  and  the  most  recent  point  release,  2,  in  late 
July. 

Hanna  says  Archive-It  was  designed  mainly  for  institutions 
(state  archives,  state  libraries,  and  university  libraries)  that 
have  a  mandate  to  archive  their  web  content  and  that  lack  the 
resources  (staff,  budget,  and  technical  capabilities)  to  do  so. 
"We  are  collaborating  with  institutions  to  save  material  that 
normally  wouldn't  be,  that  we  probably  wouldn't  save  on  our 
own,  that  they  couldn't  save  on  their  own,"  Hanna  says. 
"We're  joining  forces  to  make  sure  that  all  of  this  knowledge  is 
not  lost." 

To  begin  creating  a  collection,  subscribers  can  select  as  many 
as  300  websites  associated  with  a  particular  topic  to  be  crawled. 

Once  materials  are  archived  by  Archive-It.  subscribers  can  easily 
locate  information.  The  results  page  resembles  that  of  the  major 
search  engines. 
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and  Archive-It  can  be  programmed  to  crawl  those  sites  as  often 
as  desired  (from  daily  to  weekly  to  quarterly).  Once  archived, 
subscribers  can  subsequently  search  (either  by  text  or  URL)  the 
archived  web  pages— which  look  exactly  like  the  pages  when 
they  were  captured  on  the  web.  Those  searches  can  be  conducted 
on  either  the  Internet  Archive  or  Archive-It  sites.  Users  can 
search  by  a  variety  of  criteria,  including  subject,  date,  relevance, 
institution,  and  collection.  Advanced  search  options  include  the 
ability  to  search  between  dates. 

Version  2  of  Archive-It  offers  several  new  features  not  avail- 
able in  previous  editions.  Subscribers  can  now  conduct  test 
crawls,  which  enable  them  to  see  the  type  of  web  material  that 
would  populate  a  specific  collection  before  it  is  archived  perma- 
nently. There  is  also  a  metadata  search  capability,  which  allows 
metadata  to  be  included  in  the  text  searches  of  materials  in  a 
collection.  Archive-It  Pro  enables  subscribers  to  set  caps  on 
how  many  web  documents  are  collected  from  a  website.  It  can 
also  block  the  collection  of  materials  from  desired  websites. 

The  collections  created  by  subscribers  cover  a  wide  range  of 
subject  matter.  The  North  Carolina  state  government  has  created 
a  collection  of  web  pages  from  various  state  boards,  commissions, 
and  agencies.  Indiana  University,  for  example,  wanted  to  archive 
all  of  the  university's  web  pages.  "They're  not  sure  what's  going 
to  be  of  value  later,"  says  Hanna.  "They  are  able  to  capture 
everything  now."  Subscribers  can  view  and  download  reports 
regarding  the  status  of  the  crawls.  Archive-It  is  available  by 
annual  subscription — the  most  popular  package  is  priced  at 
$10,000  and  allows  subscribers  to  build  three  collections  with  up 
to  10  million  URLs. 

Internet  Archive  has  plans  to  expand  the  reach  of  the  Archive- 
It  service  by  targeting  smaller  entities,  such  as  independent 
researchers,  local  libraries,  and  small  non-governmental  organi- 
zations, with  a  lower-priced  version.  As  Hanna  says,  "The 
Internet  Archive's  universal  approach  to  the  dissemination  and 
access  of  information  is  embodied  in  its  Archive-It  service  that 
anybody  or  any  organization  can  use." 
(www.archive.org;  www.archive-it.org) 
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^^fc  Searchfeed.com  has  announced  Bryan  Brickley 

^^^B      as  senior  director,  GM  of  Searchfeed.com.  Bryan 

^^^V      has  been  a  member  of  the  Searchfeed.com  team 

JBj^^    since  2002,  most  recently  in  the  position  of  busi- 

^BjT^B   ness  development  director.  In  his  new  role,  Bryan 

mOu^^^M  will  be  responsible  for  the  management  of 

Searchfeed.com. 

MuseGlobal  Inc.  has  announced  that  Frank 
Bilotto  has  filled  the  newly  created  position  of  VP, 
publishing  and  digital  media.  In  his  former  position 
as  head  of  publishing  at  Vivisimo,  Bilotto  introduced 
the  company's  clustering  technology  to  online  and 
internet  publishers  and  aggregators. 

HiSoftware  has  announced  that  John  Rogers 

has  joined  the  company  as  CFG  and  COO. 
Rogers  brings  to  HiSoftware  more  than  22  years 
of  financial  and  operations  experience.  Pnor  to 
Joining  HiSoftware,  Rogers  was  partner  of 
Henniker  River  Group  since  2000. 

Cabinet  NG,  Inc.  has  appointed  John  Allen  as 
VP  of  marketing.  Allen  spent  eight 
years  as  product  marketing  manager  at  Intergraph 
Corporation,  where  he  secured  an  exclusive  mar- 
keting agreement  with  Sun  Microsystems.  Between 
2000  and  2004,  Allen  worked  in  a  consulting  role, 
assisting  a  number  of  start-up  companies  to 
launch  new  products. 

ClearForest  has  announced  that  Thomas 
Tague  has  joined  the  company  as  VP  of  solu- 
tions and  marketing.  Tague  joins  ClearForest 
from  Darwin  Partners,  where  he  served  as  EVP  of 
client  solutions  and  led  sales  and  solutions 
delivery  for  the  financial  services,  insurance,  and 
life  sciences  markets. 
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Bogus  Copyright  Claim  Squelches  Free  Speech,  EFF  Claims 


By  Staff 

(AXcess  News)  Reno,  NV  -  According  to  the  Electronic  Frontier 
Foundation  (EFF),  San  Francisco-based  Landmark  Education,  known  for 
its  Landmark  Forum  motivational  workshops,  is  trying  to  suppress  an 
investigative  television  news  piece  critical  of  its  methods,  using  bogus 
copyright  infringement  claims  to  identify  the  source  of  the  video  posts. 

"This  is  a  classic  example  of  using  a  bogus  copyright  claim  to  squelch 
free  speech,"  said  EFF  Stall  Attorney  Corynne  McSherry.  "To  the  extent 
that  the  documentary  uses  any  Landmark  material,  that  use  is  clearly 
non-infringing.  Landmark  is  simply  trying  to  use  the  streamhned  DMCA 
subpoena  process  to  obtain  the  identities  of  its  critics." 
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Using  the  alleged  copyright  violation  as  a  pretext.  Landmark  subpoenaed 
three  websites  hosting  the  video  -  the  Internet  Archive,  Google  Video,  and 
YouTube  -  seeking  the  identities  of  the  anonymous  uploaders.  The 
Digital  Millennium  Copyright  Act  (DMCA)  allows  a  content  owner  to 
issue  a  subpoena  for  the  identity  of  an  alleged  infringer  without  first  filing 
an  actual  lawsuit. 


The  Internet  Archive  is  fighting  its  subpoena,  and  EFF  filed  official 
objections  on  its  behalf  Friday.  Later  this  week,  EFF  will  also  file  a 
motion  to  quash  the  subpoena  issued  to  Google  Video,  on  behalf  of  the 
anonymous  speaker  who  uploaded  the  video.  Google  has  advised 
Landmark  that  it  will  not  produce  the  requested  information  pending  a 
ruling  on  that  motion.  YouTube  sent  notification  to  the  user  about  its 
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subpoena  and  is  giving  the  user  a  reasonable  opportunity  to  move  to 
legally  nullify,  or  "quash,"  it. 

"Sharing  videos  on  the  web  is  the  latest  example  of  free  speech  flowering 
on  the  Internet,"  said  EFF  Staff  Attorney  Kurt  Opsahl.   "Unfortunately,  it 
is  being  met  by  a  simultaneous  rise  in  the  use  of  baseless  legal  claims  as 
an  excuse  to  pierce  anonymity  and  chill  speech.  This  kind  of  intimidation 
has  to  stop." 
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Brewster  Kahle  is  on  a  mission.  He  wants  the  whole  planet  to  have  access  to  human  knowledge.  All 
human  knowledge.  And  he's  striving  to  make  that  possible-one  byte  at  a  time. 

Ten  years  ago,  Kahle  founded  the  nonprofit  Internet  Archive,  with  the  goal  of  preserving  the  hitherto  ephemeral 
pleasures  of  the  Net  for  posterity.  But,  unsatisfied  with  limiting  himself  to  the  saving  of  Web  sites,  Kahle  decided  to 
broaden  his  scope  and  include  existing  collections  of  books,  television  programs,  movies  and  music  in  the  archive's 
massive  digital  repository. 

In  addition  to  all  that  digitizing,  and  the  free  hosting  of  audio  and  video  content,  the  archive  also  sponsors  the 
SFLan.orq  project,  which  offers  free  wireless  Internet  in  San  Francisco. 

Kahle  enthusiastically  discusses  his  ambitious  plan  to  build,  make  freely  accessible 
and  preserve  what  he  calls-in  reference  to  the  legendary  lost  library  of  the  ancient 
world--the  "Library  of  Alexandria,  v. 2." 

"Let's  have  a  library  system  that  is  in  the  great  traditions  of  Thomas  Jefferson, 
Andrew  Carnegie  and  the  Library  of  Alexandna,"  he  says  while  showing  a  reporter 
around  the  Internet  Archive's  offices  in  San  Francisco's  Presidio.  "If  we  are  able  to 
build  that  library  again  with  the  vision  of  the  Greeks  but  the  technology  of  the 
modern  era,  that's  something  to  be  proud  of" 

The  45-year-old  Kahle,  hyperarticulate  and  humble,  often  sports  a  quizzical  expression  that,  with  his  spectacles; 
graying,  curly  hair;  and  bushy  eyebrows,  lends  him  a  quirky,  owlish  look.  He's  described  as  a  geek  by  friends,  but  a 
balanced  one,  whose  hobbies  range  from  sailing  with  his  wife  and  two  young  sons  to  spending  time  at  a  theater  camp 
in  Vermont. 

His  favorite  book  is  "The  Autobiography  of  Benjamin  Franklin,"  and  he's  recently  taken  to  listening  to  a  musical  group 
called  The  Ditty  Bops,  which  he  describes  as  a  cross  between  The  Andrews  Sisters  and  The  Roches. 

An  alliance  for  free  access 

Kahle  relishes  his  role  as  Internet  archivist.  The  staggering  volume  of  material  to  digitize—centuries  of  historic  media, 
and  new  data  appearing  by  the  minute-doesn't  daunt  him.  Commercial  interests  whose  monetizing  efforts  threaten  free 
universal  access  do.  So  he  readily  takes  up  the  cause  to  fight  for  freely  accessible  information 

"If  we  lose  (the  library  of  human  knowledge)  to  a  corporate  interest,  I  would  have  screwed  up.  Having  it  go  to  corporate 
hands  is  my  worst  nightmare,"  he  says. 

Which  brings  us  to  the  Open  Content  Alliance,  a  joint  effort  by  the  Internet  Archive,  Yahoo  and  Microsoft  to  digitize 
library  collections,  including  those  of  the  University  of  California  system  and  The  University  of  Toronto.  Unlike  a  similar 
project  from  Google,  which  allows  users  to  read  the  digitized  content  only  through  Google's  Web  site,  the  OCA  material 
will  be  searchable  through  any  service  and  everyone  will  be  encouraged  to  download  books. 

Also,  the  OCA  is  digitizing  only  books  in  the  public  domain,  whereas  Google  is  including  copyright-protected  titles  in  its 
scanning  efforts  and  will  offer  small  snippets  of  such  texts  to  those  searching  its  database.  As  a  result  of  Google's 
approach,  groups  representing  authors  and  publishers  have  sued  the  search  qiant. 


lof2  6/23/06  10:01  AM 

/ 


[print  version]  Brewster  Kahle's  modest  mission:  Archiving  everythin...         http://news.com.com/2102-1025_3-6087167.html?tag=st.util.print 

"Some  would  like  to  control  (information)  so  fewer  people  make  money  and  have  access.  This  is  not  right,"  Kahle  says. 
"I  really  want  the  Enlightenment-era  ideal  of  universal  education." 

Kahle  is  not  opposed  to  companies  turning  a  profit-he  pocketed  millions  in  1995  when  AOL  bought  his  first  company, 
WAIS,  one  of  the  first  Internet  search  systems.  Much  of  that  windfall  went  to  fund  the  Internet  Archive,  which  has  an 
annual  budget  of  about  $5  million. 

"I'm  not  against  people  making  money.  In  fact,  it's  absolutely  essential,"  he  says,  adding  that  there's  plenty  of  money 
to  be  made  off  services  related  to  the  distribution  of  free  information. 

Beyond  his  librarian  and  archivist  role  at  the  Internet  Archive,  Kahle  series  on  the  board  of  the  Electronic  Frontier 
Foundation  and  on  the  national  digital  strategy  advisory  board  at  The  Library  of  Congress.  He's  also  a  plaintiff  in  Kahle 
v.  Gonzales  (formerly  Kahle  v.  Ashcroft),  a  federal  lawsuit  challenging  recent  copyright  term  extensions.  Kahle  lost  in 
the  lower  court  and  has  appealed. 

"What  Brewster  has  done  is  extraordinarily  significant,  because  he  produced  an  archive  of  material  that  otherwise  just 
wouldn't  have  existed,"  says  Stanford  Law  School  professor  Lawrence  Lessig,  who  spearheaded  the  Kahle  lawsuit. 
"There  have  been  many  collectors  of  cultural  works  out  there,  but  only  Kahle  is  collecting  the  Internet.  When  he 
started  collecting  it,  most  people  were  not  yet  convinced  that  there  was  anything  there  to  be  collected." 

Lessig  remembers  that  during  the  preparations  for  the  1999  court  challenge,  Kahle  and  his  son  drove  the  Internet 
Archive's  Bookmobile  from  San  Francisco  to  Washington,  D.C.,  to  attend  the  trial.  "They  stopped  at  high  schools  along 
the  way,  printing  books,  cutting  and  binding  them  for  people  to  take  for  free,"  he  says.  "That  was  (Kahle's)  way  to 
make  tangible  what  was  really  at  stake  in  the  public  domain." 

Rick  Prelinger,  a  writer  and  filmmaker  who  donated  a  collection  of  historical  films  to  the  Internet  Archive,  remembers 
how  easily  Kahle  recruited  him  to  the  cause  when  they  first  talked  on  the  phone  in  1999.  Brewster  said  he  had  just 
been  thinking  that  he  wanted  films  for  the  Internet  Archive,  Prelinger  says. 

"How  would  you  like  to  put  your  films  online  and  make  them  accessible  for  free?'"  recalls  Prelinger,  who  taught  for  a 
time  at  the  School  of  Visual  Arts  in  New  York  City.  "We'd  only  known  each  other  for  five  seconds,  and  he  was  already 
suggesting  I  get  involved  in  the  West  Coast  gift  economy." 

"To  meet  Brewster  and  work  with  him  was  a  life  changing  experience,"  he  says.  Kahle  "thrives  on  access.  He  watches 
the  outbound  bandwidth  figures  for  the  Internet  Archive  to  see  how  many  bits  (of  data)  they  are  giving  away,  and  that's 
very  exciting  to  him,"  Prelinger  said.  "He's  a  zealot  about  bits  in  and  bits  out  and  collecting  and  disseminating 
information." 

Despite  some  hurdles,  Kahle  is  an  optimistic  man.  The  pieces  are  in  place  to  accomplish  his  dream,  he  says:  Internet 
technology  to  digitize  and  distribute  content;  ideals  of  universal  education;  and  political  will. 

"Wth  those,  I  believe  we  can  build  a  great  library  of  humankind's  thoughts  and  dreams,"  he  said. 
Copyright  ©1995-2006  CNET  Networks,  Inc.  All  rights  reserved. 
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A  fat  October  moon  shone  through  the  Presidio  treetops  the  night  Brewster  Kahle  launched 
the  latest  shot  in  the  space-race  for  a  digital  library. 

"Let's  get  the  people's  books  back  to  the  people!"  said  Kahle,  standing  at  the  podium  inside 
the  Golden  Gate  Club. 

Founder  of  the  Internet  Archive,  Kahle  is  an  ebullient  technology  visionary  of  the  type 
Northern  California  cultivates.  He  has  been  widely  recognized  as  a  digital  guru  and  a 
catalyst  for  change. 

Now,  his  vision  is  helping  shape  the  debate  over  how  a  book  library  should  reside  on  the 
Internet.  His  idealistic  yet  pragmatic  approach  ~  providing  free  digital  access  to  works  in 
the  public  domain  -  could  be  a  bridge  to  detente  in  the  war  between  publishers  and  Google 
Inc. 

While  Google  has  alienated  authors  and  publishers  with  its  plan  to  digitize  books  still  in 
copyright,  Kahle  has  moved  gingerly,  forging  collaborations  with  Google's  fiercest 
archrivals  ~  Microsoft  and  Yahoo  -  to  create  a  kinder,  gentler  digital  library  effort  called 
the  Open  Content  Alliance. 

The  alliance,  focused  on  books  no  longer  under  copyright  —  that  is,  books  published  before 
1923  —  echoes  the  computer  industry's  open  source  movement,  which  has  sought  to  spur 
innovation  by  enabling  software  engineers  to  freely  share  their  code. 

Google's  library  initiative,  the  Google  Print  Library  Project,  which  has  plans  to  digitize 
books  from  the  collections  of  their  partner  libraries  (the  New  York  Public  Library  plus  the 
libraries  of  Oxford  University,  Harvard,  Stanford  and  the  University  of  Michigan)  -- 
including  many  books  still  in  copyright  -  has  earned  the  ire  of  authors  and  publishers. 

"Google  is  building  a  database  of  value  that  was  created  by  authors  and  publishers  and 
using  it  to  advance  the  interests  of  its  revenue-generating,  for-profit  search-engine 
operation,"  Allan  Adler,  vice  president  for  legal  and  government  affairs  of  the  Association 
of  American  Publishers,  which  has  filed  a  copyright  infringement  suit,  told  The  Chronicle 
by  phone  from  New  York. 

The  launch  of  the  Open  Content  Alliance  was  like  a  step  back  in  time  for  one  attendee. 
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"I  had  a  feeling  of  being  back  in  the  early  days  of  open  source  software  --  where  everybody 
was  there  because  they  hated  Microsoft,"  said  Paul  Duguid,  a  visiting  scholar  at  UC 
Berkeley's  School  of  Information  Management  and  Systems.  "This  was  the  un-Google 
meeting." 

That  night,  Kahle  unveiled  his  new  book  scanner,  Scribe.  A  kind  of  portable  darkroom,  it 
looks  like  a  black-draped  office  cubicle.  Inside,  two  digital  cameras  peer  down  on  a  book 
held  in  a  V-shaped  glass  cradle. 

A  human  technician  turns  the  pages  and  works  the  cameras  via  foot  pedals.  (Automated 
systems  can  damage  precious  paper.)  The  results  are  super-high  resolution  photographs  at  a 
cost  of  10  cents  per  page.  I 

A  handful  of  Scribe  machines  already  have  been  sent  to  the  library  of  the  University  of 
Toronto,  an  alliance  partner. 

But  more  revolutionary  than  the  book  scanners  -  variations  of  which  Stanford,  Google  and 
others  have  developed  -  is  the  technology  Kahle  has  tested  with  his  Internet  Bookmobile, 
which  enables  scanned  books  to  be  printed  out  and  bound  in  volumes  that  faithfully 
resemble  the  original. 

It  is  here  that  Kahle  and  the  Open  Content  Alliance  have  topped  Google  ~  by  already 
getting  real  books  into  readers'  hands,  even  in  book-starved  eastern  Africa. 

"It  doesn't  look  like  a  printout  with  a  staple,"  Kahle  said.  "It  doesn't  look  like  a  report.  It 
looks  like  a  book. 

"Maybe  I'm  old-fashioned,  but  I  still  love  books." 


By  light  of  day,  the  parking  lot  of  the  Internet  Archive's  headquarters  at  the  Presidio  hosts  a 
motley  array  of  vehicles,  including  Kahle's  favorite  invention,  the  Internet  Bookmobile,  a 
green  Ford  van  with  a  satellite  dish  on  the  roof  and  a  printer  and  bookbinding  contraption 
on  the  tailgate.  The  slightly  creaky  clapboard  building,  built  in  1857  as  a  military  residence 
and  store,  has  little  in  common  with  the  nearby  sleek  campus  of  George  Lucas'  Industrial 
Light  &  Magic. 

"Where  are  the  machines?"  a  visitor  might  ask.  The  Internet  Archive's  souped-up  servers  ~ 
storing  petabytes  of  information  (1  petabyte  equals  100  million  pages)  ~  are  all  South  of 
Market,  filling  three  warehouses  to  the  rafters. 

Kahle's  journey  started  at  the  Massachusetts  Institute  of  Technology,  where  he  studied 
artificial  intelligence.  After  graduating  in  1982,  he  helped  start  a  company  called  Thinking 
Machines.  By  1989,  he  had  invented  the  first  electronic  publishing  system,  WAIS  (Wide 
Area  Information  Server),  with  a  client  list  that  included  the  White  House,  the  Government 
Printing  Office,  both  houses  of  Congress,  the  Wall  Street  Journal  and  the  New  York 
Times. 

The  company  was  acquired  by  AOL  in  1995,  and  Kahle  decamped  for  San  Francisco, 
where  he  started  the  nonprofit  Internet  Archive  in  1996  to  serve  as  a  permanent  archive  of 
digital  work  -  Web  pages,  music,  books,  software  programs  ~  available  free  to  scholars 
and  researchers.  That  year  he  also  started  a  for-profit  arm,  Alexa  Internet,  a  tool  for 
crawling  the  Web,  which  he  sold  to  Amazon.com  in  1999. 

"My  interest  is  to  build  the  great  library,"  said  Kahle,  perching  briefly  in  the  conference 
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room  of  the  Internet  Archive  shortly  before  the  alhance  event.  "That  was  the  goal  I  set  for 
myself  25  years  ago.  It  is  now  technically  possible  to  live  up  to  the  dream  of  the  Library  of 
Alexandria." 

That  storied  institution  on  the  Nile  delta  housed  all  the  world's  knowledge  until  its 
mysterious  destruction  1 ,600  years  ago. 

"Folks  are  using  the  Internet  as  a  library,  and  they're  using  it  many  times  every  day,"  Kahle 
continued.  "We're  seeing  much  more  traffic  on  the  Internet  then  we  ever  did  in  our  public 
library  system,  but  what's  available  on  the  Internet  isn't  the  best  we  have  to  offer.  Almost 
everything  on  the  Internet  has  been  written  since  1996  -  and  most  of  it  has  been  written  for 
the  Internet."  Kahle's  dream  is  to  collect  online  the  great  books  on  which  modern 
civilization  is  based. 

"Do  you  know  what's  carved  above  the  Carnegie  Library  in  Pittsburgh?  -  'FREE  TO  THE 
PEOPLE'  -  what  a  goal!"  Kahle  said.  "I  can  believe  m  this!  At  the  Internet  Archive,  we 
think  of  our  mission  as  'universal  access  to  all  knowledge.' 

"That  should  be  carved  over  our  door." 

Early  this  year,  Kahle  was  in  talks  with  Yahoo's  vice  president  for  search  technology, 
David  Mandelbrot,  and  Sumir  Meghani,  business  development  manager  of  the  Sunnyvale 
Internet  company. 

"We  wanted  to  figure  out  how  the  nonprofit  sector  could  work  with  the  commercial  sector," 
Kahle  said.  The  subject  of  "a  digital  library  of  Alexandria"  just  naturally  came  up. 

Yahoo  proposed  creating  a  freely  accessible  digital  library  that  would  include  only  books  in 
J  the  public  domain. 

"After  that,  it  was  easy  to  know  how  to  proceed,"  Kahle  said. 

It  was  agreed  that  Yahoo  would  supply  the  search  engine  for  the  Web  site  and  index  the 
\         books  scanned  by  the  Internet  Archive's  Scribe  machines. 

The  Open  Content  Alliance  was  bom.  By  October,  an  impressive  group  of  libraries  and 
publishers  had  promised  to  participate,  including  the  Smithsonian,  Johns  Hopkins 
University,  University  of  Toronto,  British  National  Archives,  European  Archives,  O'Reilly 
Media  and  Prelinger  Archives  plus  multimedia  companies  LibriVox,  Octavo  and  others. 

The  University  of  California  already  has  started  its  contribution:  a  collection  of  18,000 
works  of  American  fiction,  which  librarians  are  selecting  from  the  10-library  statewide 
system.  Microsoft's  MSN  Search  has  promised  $5  million  toward  the  scanning  of  150,000 
books,  and  both  Adobe  and  Hewlett-Packard  will  contribute  advanced  digital  imaging. 

Kahle  hopes  to  have  "a  couple  of  great  collections  up  on  the  Web  by  the  end  of  2006." 

The  Google  Print  Library  Project  differs  from  Kahle's  in  an  important  way:  Google  is 
creating  not  a  library  but  a  vast  electronic  card  catalog. 

"We  have  been  very  clear  that  we  want  to  build  a  book-finding  tool,  not  a  book-reading 
tool,"  said  Jim  Gerber  of  Google  Print. 
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"Even  before  we  started  Google,  we  dreamed  of  making  the  incredible  breadth  of 
information  that  librarians  so  lovingly  organize  searchable  online,"  co-founder  Larry  Page 
told  author  David  A.  Vise  in  his  new  book,  "The  Google  Story,"  out  this  month  from 
Delacorte. 

Back  in  October  2004,  they  took  the  first  step,  announcing  the  Google  Print  program  at  the 
annual  Frankfurt  Book  Fair.  It  would  allow  viewers  to  search  books  online  —  but  not  scan 
or  print  them  out  --  based  on  agreements  with  publishers.  A  similar  project,  Amazon's  free 
"Search  Inside  the  Book,"  had  already  proved  to  boost  book  sales. 

Since  Google's  main  source  of  revenue  is  its  signature  all-text  ads,  which  are  linked  by 
topic  but  are  separate  from  searched  content,  that  model  would  be  repeated.  Google  and  the 
publishers  would  split  the  proceeds. 

But  this  summer,  at  the  2005  Frankfurt  Book  Fair,  publishers  and  authors  were  bristling 
over  Google's  most  recent  announcement. 

A  new  project  —  Google  Print  Library  ~  was  set  to  begin  digitizing  library  books, 
including  many  still  under  copyright.  Only  snippets  of  text  would  be  viewable,  so  Google 
claimed  this  was  fair  use.  Google  also  considered  itself  under  no  obligation  to  ask 
copyright  holders'  permission  before  scanning  books. 

Publishers  saw  it  differently:  Because  entire  books  would  be  digitized  to  provide  such 
snippets,  they  feared  piracy  ~  and  the  damage  that  free  file-sharing  has  done  to  the  music 
industry.  Also,  publishers  were  wary  of  Google  having  the  biggest  online  library  in  the 
world  at  its  disposal  when,  in  the  future,  copyright  law  changes  to  adapt  to  the  Internet  age. 

In  August,  the  8,000-member  Authors  Guild  sued  Google  to  cease  and  desist;  the 
American  Association  of  Publishers,  with  300  members,  sued  for  copyright  infringement; 
and  PEN  USA  and  the  International  Publishers  Association  issued  a  joint  declaration 
calling  the  Google  Library  Project  "in  breach  of  existing  copyright  law." 

In  response,  Google  suspended  book  scanning  for  three  months  to  give  authors  and 
publishers  time  to  be  excluded  if  they  feared  piracy. 

"Early  on  in  the  discussions  about  Google  Print,  that  was  one  of  the  fears  noted  most 
regularly,"  Gerber  said.  "Frankly,  that's  part  of  the  reason  we  changed  our  policy.  That's  the 
purpose  of  our  exclusion  option." 

On  Nov.  1,  Google  resumed  scanning  books.  Just  two  days  later,  Google  Print  announced 
the  availability  of  its  first  large  collection  of  books  -  all  in  the  public  domain.  And  this 
week,  a  strategic  name  change  was  announced.  Google  Print  is  now  Google  Book  Search. 
A  posting  on  Thursday  by  Jen  Grant,  product  marketing  manager,  said:  "Why  the  change? 
Well,  one  factor  was  all  the  comments  we  got  about  how  excited  people  were  that  Google 
Print  would  help  them  print  out  their  documents,  or  Web  pages  they  visit  -  which  of 
course  it  won't." 

Meanwhile,  in  anticipation  of  the  new  digital  marketplace,  two  other  companies  scrambled 
to  accommodate  fee-based  online  viewing.  Amazon  announced  Amazon  Pages  (unlike 
"Search  Inside  the  Book,"  a  fee  will  be  charged  for  page  viewing  of  certain  books).  And, 
separately,  one  of  the  nation's  largest  publishers,  Random  House,  set  a  price  for  future 
transactions  ~  4  cents  per  page  for  viewing  more  than  5  percent  of  a  book. 

"Brewster  Kahle  is  an  activist,  not  an  empire-builder,"  said  Paul  Saffo  of  the  Palo 
Alto-based  Institute  of  the  Future. 
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"What  I've  always  admired  about  Brewster  Kahle  is  his  attitude  ~  'let's  get  the  job  done  and 
find  out  what  the  wrinkles  are,' "  said  UC's  Duguid. 

^B  "If  they  would  team  up  —  with  Google's  strength  and  Kahle's  philosophy,"  Duguid  mused, 

"that  would  be  great." 

When  asked  whether  an  association  with  Open  Content  Alliance  was  in  the  works,  Google 
spokesman  Nate  Tyler  said,  "We  are  talking  to  them,  but  there's  nothing  to  announce  yet." 
More  than  once,  Kahle  has  expressed  his  desire  to  see  Google  join  forces  with  the  project 
in  some  capacity. 

Before  taking  the  podium  at  the  Presidio,  Kahle  told  a  visitor,  "I  applaud  Google's  efforts. 
They've  got  a  bold  vision.  But  their  approach  seems  to  have  caused  lawsuits. 

"C'mon,  guys!  Let's  get  the  businesspeople  back  at  the  table,  and  send  the  lawyers  back  to 
their  cubicles!" 

Circulating  in  the  crowd  that  night  was  Kahle's  wife,  Mary  Austin,  founder  of  the  San 
Francisco  Center  for  the  Book,  and  their  two  sons.  It  was  the  end  of  a  long  day.  The  next 
morning,  the  family  was  set  to  fly  to  China,  where  Kahle  would  address  an  international 
conference  on  digital  libraries. 

"If  we  do  this  right,  it  will  be  remembered  as  one  of  the  great  things  humans  have  done,  up 
there  with  the  Library  of  Alexandria,  Gutenberg's  press  and  putting  a  man  on  the  moon," 
Kahle  said  in  closing.  "We're  going  step  by  step  —  first,  let's  see  if  we  can  get  the 
technology  right  so  that  you'll  actually  want  to  see  a  book  on  a  screen. " 

E-mail  Heidi  Benson  at  hhenson@sfchronicle.com. 
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This  story  is  from  AFP 

Web  library  race  resumes 

Laurence  Benhamou  in  New  York 
October  30,  2006 

A  RACE  is  on  to  digitise  the  world's  books,  pitting  internet  juggernaut  Google  against  a  vast 
anti-Google  coalition  backed  by  rivals  Yahoo  and  Microsoft. 

In  late  August,  Google  restarted  its  Google  Book  Search  project  initiated  in  2004  with  the  lofty  aim  of 
scanning  every  literary  work  into  digital  format  and  making  them  available  online. 

Google  has  formed  partnerships  with  major  universities  such  as  Harvard,  Oxford,  the  New  York  Public 
Library,  Complutense  of  Madrid  and  the  University  of  California  to  add  their  collections  to  its  virtual 
book  shelves. 

In  mid-October  the  University  of  Wisconsin  made  its  extensive  selection  of  historical  works  available 
to  the  Mountain  View,  California -based  internet  powerhouse. 

Google  has  stored  on  its  searchable  database  classic  works  in  the  public  domain,  along  with 
copyrighted  books  either  sent  with  or  without  the  publishers'  permission. 

Google  used  its  online  search  expertise  to  craft  search  boxes  that  use  keywords,  genres  and  authors 
to  find  works  as  opposed  to  the  romantic  practice  of  sifting  through  cards  in  a  library  reference 
index. 

Google  claimed  the  right  of  "freedom  of  quotation "  to  pull  up  search  results  from  books. 

The  virtual  library  project  caused  an  outcry  from  publishers  and  authors  that  argued  Google  did  not 
have  the  right  to  commandeer  their  works  for  free  distribution  online. 

Google  has  also  rejected  claims  that,  being  based  in  the  US,  it  has  favoured  English.  It  has  promised  it 
would  next  roll  out  a  Google  Book  Search  in  French. 

Opposition  to  the  project,  particularly  by  French  and  US  editors,  resulted  in  a  group  of  book 
publishers  forming  the  Open  Content  Alliance  (OCA)  in  October  of  2005. 

The  OCA  is  a  non-profit  organization  which  joins  together  an  array  of  universities,  foundations,  and 
data  processors  to  create  a  "common  pot"  of  digitized  books  available  online  for  download  or 
printing. 

The  proposed  collection  of  works  contributed  by  members  would  consist  of  35,000  works,  including 
those  of  precursors  such  as  the  Gutenberg  Project. 
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'The  question  is  whether  the  knowledge  of  the  world  will  be  property  of  a  private  company  or  open 
to  all, "  Open  Content  Alliance  founder  Brewster  Kahle  said.  "Google  thinks  public  is  private. " 

"Everybody  can  make  money  out  of  it,"  Mr  Kahle,  who  is  also  president  of  a  site  called  the  internet 
Archive,  said.  "We  hope  to  see  many  search  engines." 

Initially  backed  by  Yahoo,  which  was  to  tailor  a  search  engine  and  finance  converting  18,000  books 
to  digital  format,  the  alliance  was  quickly  joined  by  technology  titan  Microsoft. 

The  world's  leading  computer  software  company  promised  to  contribute  150,000  digitized  books  to 
the  OCA  collection. 

Microsoft  also  plans  to  launch  its  own  large-scale  virtual  book  search  engine  called  Windows  Live 
Books  Search  "later  this  year,'"  and  begin  forming  its  own  collection  of  works. 

Microsoft  followed  Google's  lead  by  asking  editors  to  submit  their  books  to  be  scanned  into  digital 
format  free  of  charge. 

"Microsoft  will  be  more  closed,"  Mr  Kahle  said,  who  is  eager  to  see  the  Redmond,  Washington-based 
firm's  budding  project. 

Microsoft  was  working  double-time  to  catch  up  with  Google  in  the  virtual  books  department. 

In  mid-October  Microsoft  signed  a  deal  with  Kirtas,  a  manufacturer  of  high-speed  scanners  capable  of 
digitizing  an  average-length  book  in  eight  minutes. 

Microsoft  also  arranged  to  digitise  the  contents  of  the  Cornell  University  library. 

Neither  Google  nor  Microsoft  would  reveal  how  many  books  they  have  already  scanned. 

"In  the  thousands,"  was  the  only  hint  Google  would  give. 

At  stake  for  the  companies  were  advertising  revenues  that  could  be  raked  in  from  book-seeking 
internet  surfers. 

"We  are  looking  into  the  possibility  of  incorporating  ads  into  the  Windows  Live  Book  Search  platform 
sometime  in  the  future, "  Microsoft  said. 

The  outcome  of  the  battle  of  the  online  libraries  will  undoubtedly  hinge  on  court  decisions  regarding 
copyright  protections,  and  which  search  engine  v/ins  over  the  most  coveted  collections  of  written 
works. 

The  Open  Content  Alliance  hopes  to  recruit  the  National  Library  of  France,  where  90,000  books  have 
already  been  scanned. 
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Behind  the  random  silliness  of  YouTube  videos  and  the  juvenile  frivolity  of  MySpace  Web 
sites  lies  a  powerful  idea:  Everyday  people  are  using  technology  to  gain  control  of  the 
media  and  change  the  worid. 

At  least  that's  what  a  new  breed  of  Internet  technologists  and  entrepreneurs  want  us  to 
believe.  The  new  Internet  boom  commonly  referred  to  as  Web  2.0  is  really  an  exercise  in 
digital  democracy. 

Dubbed  Digital  Utopians  by  some,  and  Web  2.0  innovators  by  others,  this  latest  wave  of 
tech  gurus  champion  community  over  commerce,  sharing  ideas  over  sharing  profits.  By 
using  Web  sites  that  stress  group  thinking  and  sharing,  these  Internet  idealists  want  to 
topple  the  power  silos  of  Hollywood,  Washington,  Wall  Street  and  even  Silicon  Valley. 
And  like  countless  populists  throughout  history,  they  hope  to  disperse  power  and  control, 
an  idea  that  delights  many  and  horrifies  others. 

Tim  O'Reilly  is  the  founder  and  chief  executive  of  O'Reilly  Media  in  Sebastopol,  a  tech 
publisher  and  event  organizer  who  hung  a  name  on  the  movement  with  his  Web  2.0 
Conference,  which  will  be  held  this  year  starting  Tuesday  in  San  Francisco.  In  his 
manifesto  on  the  movement  last  fall,  O'Reilly  wrote  glowingly  about  "the  wisdom  of 
crowds"  and  the  "architecture  of  participation." 

Winners  on  the  Internet  "have  embraced  the  power  of  the  web  to  harness  collective 
intelligence,"  O'Reilly  wrote,  populating  "a  worid  in  which  'the  former  audience,'  not  a  few 
people  in  a  back  room,  decides  what's  important." 

Indeed,  millions  of  people  each  month  visit  social  networking  destinations  like  MySpace, 
online  encyclopedias  like  Wikipedia  and  video-sharing  sites  like  YouTube.  Political  groups 
like  MoveOn.org  have  galvanized  grassroots  organizing.  News  aggregators  like  Digg.com 
have  given  editing  power  to  readers.  Combined,  these  Web  sites  have  changed  the 
landscape  of  countless  industries  and  some  have  become  worth  billions. 

They  have  also  tapped  a  nerve,  resonating  with  people  who  feel  powerless  to  affect  the 
major  power  structures. 
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The  core  of  the  Web  2.0  movement  resurrects  an  age-old  debate  about  governance  and 
democracy,  one  that  was  argued  by  political  philosophers  such  as  Jean-Jacques  Rousseau 
and  Alexis  de  Tocqueville:  Are  the  benefits  of  democracy  -  taking  advantage  of  what  Web 
2.0  proponents  call  the  wisdom  of  the  crowds  ~  worth  risking  the  dark  side  of  mob  rule? 

Chris  Messina  thinks  so.  Messina,  26,  is  a  blogger,  activist  and  "open-source  evangelist,"  a 
charismatic  geek  spreading  the  notion  of  digital  democracy. 

"There  is  more  potential  today  for  individuals  to  change  their  destiny  than  there's  been  in 
ages,"  Messina  said.  "We  need  to  get  back  to  the  idea  that  anyone  can  dream  big  and  make 
it  happen." 

Like  any  popular  movement,  it  also  has  its  critics. 

Andrew  Keen  warns  against  the  dangers  of  embracing  technology's  level  playing  field. 
Keen,  46,  a  former  professor  and  philosopher  turned  tech  entrepreneur,  published  a  tract 
this  year,  "Web  2.0  Is  Reminiscent  of  Marx,"  and  is  working  on  a  book  lambasting  "The 
Cult  of  the  Amateur." 

If  people  are  absorbed  with  content  created  by  fellow  amateurs.  Keen  argues,  will  they  ever 
know  greatness?  If  bloggers  disrupt  mass  media,  will  they  follow  journalistic  rules  of 
fairness?  Can  an  army  of  amateur  journalists  adequately  replace  corporate  news-gathering? 
Will  sophomoric  YouTube  videos  take  the  place  of  great  films? 

Keen  dismisses  what  he  calls  the  "militant  and  absurd"  buzzwords  of  Web  2.0: 
Empowering  citizen  media,  radically  democratize,  smash  elitism,  content  redistribution, 
authentic  community. 

Yet  those  buzzwords  pop  up  nearly  everywhere.  Google's  deal  to  buy  video  startup 
YouTube  for  $1.65  billion  provides  a  glimpse  at  the  potential  future  of  media,  one  in  which 
programs  are  made  not  by  polished  production  companies  but  by  anyone  with  a  low-cost 
camcorder.  In  this  world,  programming  decisions  rest  with  what  Web  2.0  calls  the 
community  and  what  critics  deride  as  the  mob  instead  of  with  cadres  of  networking 
executives. 

Rather  than  turning  to  the  mass  media,  people  can  get  their  information  from  blogs, 
podcasts  and  even  MySpace  pages.  Rather  than  dialing  up  a  radio  station  churning  out  a 
corporate  playlist,  anyone  can  put  his  idiosyncratic  broadcasts  online  at  sites  like  Mercora 
and  Lala.com. 

"Everything  is  getting  flatter,"  said  Ori  Brafman  of  San  Francisco,  an  entrepreneur  and 
author.  "The  amateurization  is  a  wonderful  thing." 

Brafman  is  co-author  of  "The  Starfish  and  the  Spider,"  a  book  about  "the  unstoppable 
power  of  leaderless  organizations."  Spiders,  in  the  title  metaphor,  are  crippled  or  dead  when 
they  lose  a  limb;  starfish,  on  the  other  hand,  can  grow  new  arms.  In  the  book,  Wikipedia  - 
a  Web  site  on  which  users,  not  experts,  write  and  edit  a  free  online  encyclopedia  -  is 
identified  as  a  classic  starfish  organization. 

"When  you  put  people  in  this  kind  of  an  open  system,  it  brings  out  a  different  side  of 
people,"  Brafman  says.  "They  know  the  system  is  based  on  trust  and  shared  responsibility. 
People  step  up  to  the  bat  and  perform." 

Not  exactly,  says  Nicholas  Carr,  an  author  and  blogger  who  joins  Keen's  contrarian  view 
of  Web  2.0. 
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Carr  calls  Wikipedia  "at  once  a  major  achievement  and  a  mediocre  piece  of  work." 

"Does  something  like  Wikipedia  change  the  economics  of  producing  an  encyclopedia 
(written  by  experts)?  As  soon  as  you  have  a  mediocre  product  that's  free,  is  that  going  to 
destroy  the  possibility  of  having  a  superior  product  that  costs  something?"  he  asks. 

Idealism  in  technology 

The  notion  of  a  Digital  Utopia  has  been  around  since  the  early  days  of  computing.  Many  of 
the  industry's  pioneers  saw  their  machines  as  tools  that  could  make  life  easier  and  unite 
humanity. 

Some  of  the  earliest  successes  on  the  Internet  have  an  almost  direct  lineage  to  the  liberal 
politics  of  the  1960s  anti-war  and  civil  rights  movements.  Former  Merry  Prankster  Stewart 
Brand  was  the  brains  behind  the  Well,  an  early  online  community  that  took  its  name  in  part 
from  his  Whole  Earth  Review.  New  York  Times  technology  reporter  John  Markoff  argues 
in  his  book,  "What  the  Dormouse  Said,"  that  the  '60s  counterculture  played  a  pivotal  role  in 
shaping  the  personal  computing  industry. 

Underpinning  the  technology  movement  has  always  been  a  sense  of  community.  "Take  the 
Summer  of  Love,"  said  Brewster  Kahle,  45,  founder  of  the  Internet  Archive  in  San 
Francisco,  a  digital  archiving  and  storage  site.  "A  tradition  was  started  then  of  recording 
concerts  of  those  bands  and  passing  the  tapes  around.  But  they  had  a  firm  rule  that  you 
couldn't  make  any  money.  If  you  didn't  make  any  money  at  all,  it  was  OK  to  share  the 
love." 

That  tradition  has  migrated  to  the  online  world,  Kahle  said,  not  just  to  his  site  but  to  a 
renewed  resistance  to  copyright  and  a  love  of  sharing  information,  whether  it  is  music  or 
knowledge. 

The  Utopian  life 

Today's  Digital  Utopian  takes  many  forms,  from  the  aging  '60s  hippie  to  the  tech-savvy 
youthful  idealist.  They  share  little  physically,  but  most  everything  mentally. 

The  Utopians  attend  loosely  organized  gatherings,  often  called  un-conferences,  where  there 
is  no  agenda  other  than  participation.  Messina  and  his  girlfriend,  Tara  Hunt,  have 
spearheaded  a  series  of  events  known  as  BarCamp,  where  ad-hoc  groups  bring  sleeping 
bags  and  food  to  an  office  that  a  company  has  donated  for  the  occasion,  and  then  stay  up 
late  engaging  in  freewheeling  roundtable  discussions  about  how  to  use  the  latest 
technological  innovations.  BarCamp  events  have  been  held  around  the  world;  when  one  is 
arranged,  it's  posted  online  at  www.barcamp.org,  and  anyone  can  show  up. 

"Most  of  these  un-conference  things  are  a  big  experiment,"  Messina  said  at  WineCamp,  at 
which  geeks  gathered  in  a  Calaveras  County  vineyard  to  figure  out  ways  they  could  use 
technology  to  help  nonprofit  groups.  "It's  up  to  each  one  of  us  to  make  it  interesting." 

Utopians'  favored  reading  material  includes  "The  Long  Tail,"  in  which  Wired  editor  Chris 
Anderson  contends  the  Internet  makes  it  possible  to  no  longer  rely  on  hits  (most  notably 
movies,  books  and  music)  but  to  make  big  bucks  by  selling  niche  products  along  the  tail  of 
the  demand  curve. 

"People  are  taking  advantage  of  all  of  these  powerful  forces,"  he  said.  "The  world  is 
changing." 
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The  modern  Digital  Utopian  uses  the  Internet  constantly  and  extensively  to  share 
information  and  ideas  at  social  networking  sites  and  communities. 

It  all  adds  up  to  a  shared  experience,  and  a  generally  shared  philosophy  centered  on 
technology  and  idealism.  And  where  there  is  a  point  of  view,  a  counterpoint  is  sure  to 
emerge. 

The  critics  and  the  co-opters 

Web  2.0  has  inspired  its  share  of  critics,  social  commentators  who  wonder  if  something 
valuable  may  get  lost  in  the  online  hubbub. 

Can  the  community  produce  a  collaborative  decision?  Not  usually,  Keen  says,  contending 
that  often  a  great  leader  is  needed  to  take  charge,  whether  it's  Bismarck  uniting  Germany  or 
Steve  Jobs  developing  the  iPod.  These  things  don't  happen  by  putting  every  decision  up  for 
a  vote,  like  a  California  ballot  measure. 

Keen,  who  spurred  the  debate  by  coining  the  phrase  Digital  Utopian,  runs  a  curmudgeonly 
blog.  The  Great  Seduction,  on  the  intersection  of  technology,  media  and  culture,  at 
andrewkeen.tvpepad.com. 

Oddly  enough,  the  Utopians  may  become  victims  of  their  own  success.  While  they 
advocate  a  world  in  which  people  can  share  content  without  concern  for  profit,  much  of 
what  they  are  creating  is  becoming  a  tool  of  the  corporate  culture  they  decry. 

MySpace,  the  most  popular  social  networking  site,  is  owned  by  Rupert  Murdoch's  News 
Corp.  YouTube,  which  was  built  so  that  people  could  share  their  videos  with  each  other, 
has  been  bought  by  Google.  And  even  though  Google's  motto  is  "Don't  be  evil,"  it  is  a 
publicly  traded  company  with  a  fiduciary  duty  to  make  money. 

O'Reilly  says  this  is  the  natural  order  of  things. 

He  said  the  term  came  from  the  wreckage  of  the  dot-coms  several  years  ago.  Even  though 
many  businesses  folded,  the  Internet  continued  to  grow,  mature  and  become  more 
indispensable. 

Yet  while  people,  perhaps  reacting  to  the  greed  that  fueled  the  IPOs  of  the  dot-com  years, 
saw  in  Web  2.0  a  chance  to  create  a  new  collectivism,  O'Reilly  said,  "I  don't  see  it  that  way 
at  all." 

Web  2.0,  he  says,  is  about  business. 

He  says  many  tech  movements  start  out  with  similar  idealism,  only  to  give  way  to 
capitalism.  For  instance,  O'Reilly  says,  Napster  introduced  file  sharing,  but  now  iTunes  has 
people  comfortable  with  paying  for  music  online. 

"You  do  a  barn  raising  at  a  particular  stage  of  society,"  he  said,  "and  then  the  developers 
come  in.  ...  It  always  happens  that  way." 

E-mail  Dan  Fost  at  dfost@sfchronicle.com. 
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Some  of  the  key  people  in  the  Digital  Utopian  and  Web  2.0  movements.  Where  possible, 
the  list  includes  links  to  their  blogs. 

Chris  Anderson,  Wired  magazine  editor  who  published  a  seminal  piece  in  Wired  in  2004, 
"The  Long  Tail,"  that  showed  how  the  Web  is  full  of  tiny  sites  and  niche  opportunities.  He 
blogged  about  it  and  wrote  a  book  of  the  same  title,  www.thelongtail.com 

Don't  confuse  him  with  the  other  Chris  Anderson,  who  puts  on  the  annual  TED  conference 
in  Monterey  and  is  also  influential  in  the  movement,  particularly  as  his  TED  prize  seeks  to 
add  some  social  conscience  to  what  was  traditionally  a  gathering  of  business  and  ideas 
people,  tedblog.tvpepad.com. 

Heather  Armstrong,  one  of  the  leading  "mommy  bloggers,"  who  was  famously  fired  for 
her  blog  at  Dooce.com.  ("Getting  Dooced"  has  since  become  Internet-speak  for  "getting 
fired  for  your  blog.")  www.dooce.com. 

Michael  Arrington,  influential  TechCrunch  blogger  who's  daily  postings  can  make  or  break 
a  2.0  startup.  Recently  under  fire  for  conflict-of-interest  issues,  Arrington  is  known  for  his 
Silicon  Valley  connections  and  the  blowout  parties  he  throws  at  his  Atherton  house. 
techcrunch.com 

John  Battelle,  host  of  the  Web  2.0  conference.  Battelle  is  a  former  Wired  editor  who  rode 
the  dot-com  boom  and  bust  as  founder  and  president  of  the  Industry  Standard  magazine 
and  has  written  about  Google  and  search  technology  in  his  book,  "The  Search."  His  latest 
venture.  Federated  Media,  seeks  to  be  an  ad  platform  for  high-traffic  blogs. 
www.battellemedia.com. 

Elissa  Camahort,  Jory  Des  Jardins  and  Lisa  Stone,  the  founders  of  BlogHer,  a  Web  site 
and  annual  conference  that  gives  voice  to  the  wide  range  of  women  online. 
www.blogher.org.  Camahort's  blog:  homepage.mac.com/elisa  camahoit/iblog.  Des  Jardins' 
blog:  jorydesjardins.com.  Stone's  blog:  surfette.tvpepad.com. 

Caterina  Fake  and  Stewart  Butterfield,  the  game-designing  couple  from  Vancouver,  British 
Columbia,  who  created  the  photo-  sharing  site  Flickr.  Fake  is  often  mentioned  on  a 
first-name  basis  in  Web  2.0  circles,  as  in:  "Did  you  see  what  Caterina  posted  on  her  blog 
today?"  www.caterina.net.  (Stewart's  blog,  www.svlloge.com.  is  idle;  he  posts 
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occasionally,  with  others,  at  blog.flickr.com.) 

Tara  Hunt,  who  with  her  boyfriend,  Chris  Messina,  has  founded  Citizen  Agency,  a 
consulting  firm  built  on  a  loose  affiliation  of  independent  workers  with  plans  to  advise 
startups  on  building  their  products  and  their  communities.  Hunt  is  a  leading  advocate  of 
"pinko  marketing,"  or  the  notion  of  a  company  becoming  so  entwined  with  its  community 
that  the  community  can  change  the  products,  www.horsepigcow.com. 

Brewster  Kahle,  founder,  director  and  digital  librarian  of  the  Internet  Archive,  a  nonprofit 
organization  in  San  Francisco's  Presidio  that  seeks  to  provide  universal  access  to  all 
information. 

Kevin  Kelly,  founding  executive  editor  of  Wired  magazine,  where  he  remains  a  "senior 
maverick."  He  is  known  for  his  optimistic  view  of  technology,  expressed  in  a  Wired  article 
that  became  the  book  "The  Long  Boom"  and  in  a  more-recent  New  York  Times  magazine 
cover  story  on  how  book-scanning  endeavors  by  Google  and  the  Internet  Archive  may  lead 
to  the  long-held  dream  of  a  Library  of  Alexandria,  www.kk.org. 

Lawrence  Lessig,  founder  and  director  of  the  Center  for  Internet  and  Society  at  Stanford 
University,  and  chairman  and  chief  executive  of  Creative  Commons,  the  San  Francisco 
nonprofit  group  that  is  plowing  new  ground  in  copyright  law.  www.lessig.org/blog. 

Chris  Messina,  or  Factory  Joe  as  he's  known  online,  is  the  founder  of  BarCamp,  an 
open-source  un-conference  in  which  anyone  can  show  up  and  talk  about  tech.  Messina  is 
also  an  open-source  evangelist  and  Web  design  consultant,  www.factorvioe.com/blog. 

Craig  Newmark,  the  former  IBM  engineer  who  started  an  e-mail  list  to  tell  friends  about 
what  was  going  on  in  San  Francisco  and  saw  it  grow  into  an  international  Internet 
phenomenon  often  blamed  for  the  newspaper  industry's  struggles.  The  site, 
www.craigslist.com,  tells  of  goods  for  sale,  apartments  for  rent,  jobs  available  and  other 
deals  in  scores  of  cities  around  the  world.  Most  of  the  ads  are  free,  www.cnewmark.com. 

Tim  O'Reilly,  founder  and  CEO  of  O'Reilly  Media  in  Sebastopol,  and  the  person  most 
identified  with  the  term  Web  2.0.  O'Reilly  runs  an  annual  Web  2.0  conference  -  the  one 
this  week  will  be  the  third  -  that  attracts  many  of  the  leading  lights  of  the  Internet  industry, 
talking  about  how  the  Web  is  changing,  and  the  implications  of  those  changes  in  the  world 
of  business,  radar.oreillv.com. 

Zack  Rosen,  a  22-year-old  who  dropped  out  of  college  and  helped  build  technology 
applications  that  fueled  the  grassroots  interest  in  Howard  Dean's  presidential  campaign.  He 
heads  San  Francisco's  CivicSpace,  which  is  building  on  that  software  with  both  nonprofit 
and  for-profit  arms,  www.zacker.org. 

Jimmy  Wales,  a  former  options  trader  who  had  the  notion  to  build  an  open-source 
encyclopedia,  a  project  that  became  Wikipedia.  Wales  remains  chairman  of  the  Wikimedia 
Foundation  that  oversees  the  site,  but  also  runs  wiki-related  for-profit  ventures,  blog.jim 
mywales.com. 
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Landmark  Fires  Back  at  EFF 

Organization  says  its  subpoena  of  Google  and  YouTube  is  self-protection,  not  free  speech 
muzzle. 

/Vwi.-H/n.'ief-J,  2006 

An  attorney  for  Landmark  Education,  a  15-year-old  personal  development  organization,  called  charges  made 
by  a  digital  rights  group  that  it  is  using  copyright  law  as  a  tool  to  suppress  the  free  speech  rights  of  Google 
Video  and  YouTube  uploaders  inaccurate  and  unfounded. 

Attorney  Art  Schreiber,  general  counsel  for  Landmark  Education,  said  the  organization  he  represents  was  simply 
protecting  its  intellectual  property,  not  seeking  to  limit  anyone's  right  to  full  and  free  self-expression. 

"We  took  these  actions  solely  to  protect  our  copyrighted  material,  annrPriafP  thP 

which  is  the  principal  source  of  our  business  operations,"  Mr.  Schreiber  \a/%-L  nl^-hi^i   FFF 

said  in  an  email  to  Red  Herring.  "While  freedom  of  speech  on  the  jlL^' "S^  iV^ _{-^J^   '^^ 

Internet  is  essential,  it  is  also  vital  that  copyrighted  materials  be  ^''_^_  ?U?9^r^Jy!j    tridl. 
protected." 


o,ur  copyright 
claim  IS  bcrqu! 


Landmark  has  subpoenaed  YouTube,  Google  Video,  and  Internet  ^ntirSlV 

iTc 


gus  IS 


Archive  seeking  the  identity  of  the  individuals  who  uploaded  a  inSCCUrStG. 

French-made  documentary  that  the  group  believes  includes  "Aft  SchrGlDSr, 

copyrighted  portions  of  its  program,  the  Landmark  Forum  (see  Google,  La  PI 0171 9  TK 

'louc  ' 


YouTube  Video  Challenge).  EuUCatlOn 

"Portions  were  taped  without  authorization  by  a  person  who  was  in  the 
program  under  a  false  name,"  Mr.  Schreiber  said. 

EFF  Challenge 

The  Electronic  Frontier  Foundation  is  challenging  the  subpoenas  on  the  grounds  that  the  usage  of  the  material 
constitutes  fair  use  since  it  is  used  for  the  purpose  of  criticism,  commentary,  and  news  reporting. 

Fair  use  is  a  United  States  doctrine  that  spells  out  where  one  can  freely  use  copyrighted  material.  Fair  use 
frequently  involves  scholarship  or  review. 

The  case  revolves  around  a  French  documentary  entitled  Voyage  to  the  Land  of  the  New  Gurus,  which  includes 
hidden  camera  footage  taken  inside  a  Landmark  event  held  in  France,  and  additional  footage  inside 
Landmark's  offices  in  France. 

The  documentary  has  shown  up  on  YouTube,  Google  Video,  and  Internet  Archive. 

"Upon  learning  that  the  video  was  posted  on  several  web  sites,  we  availed  ourselves  of  the  rights  provided 
under  the  Digital  Millennium  Copyright  Act  (DMCA)  to  request  the  identity  of  the  people  who  posted  this 
video,"  Mr.  Schreiber  wrote  in  his  email. 

"The  Electronic  Frontier  Foundation  (EFF)  challenged  our  actions  and  alleged  to  the  press  that  our  copyright 
claims  were  bogus,  which  statement  was  then  disseminated  on  the  Internet,"  he  continued. 

Mr.  Schreiber  said  that  Landmark  Education's  goal  is  not  to  silence  anyone,  but  to  protect  its  core  IP  resources, 
which  were  infringed  by  the  video. 

"While  we  appreciate  the  work  of  the  EFF,  the  allegation  that  our  copyright  claim  is  bogus  is  entirely 
inaccurate,"  he  said.  "The  facts  are  clear  that  the  Landmark  Forum  program  has  for  many  years  been 
copyrighted.  Materials  covered  by  this  copyright  registration  were  included  throughout  the  video." 
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Labs  On-Site:  The  Internet  Archive  stores  and  protects  petabytes 
of  data. 

Ten  years  In  the  making,  the  Internet  Archive— an  ambitious  project  to  store  and  archive  all  the  Web  pages  on  the 
Internet  along  with  other  forms  of  digital  content— houses  more  than  4  petabytes  of  data  (1.6  petabytes  of  primary 
data)  using  standards-based  modular  hardware  and  open-source  software. 

The  organization's  strategies  for  storing  and  managing  that  data  can  serve  as  best  practices  for  any  company  trying 
to  get  its  arms  around  an  ever-expsnding  data  load. 

Multiterabyte  data  centers  are  quite  common 
these  days,  but  petabyte-size  data  stores 
remain  somewhat  novel.  To  see  firsthand  how 
the  Internet  Archive  is  handling  the  storage  of  all 
its  data,  eWEEK  Labs  went  on-site  at  the  digital 
library's  San  Francisco  data  center. 

The  Internet  Archive  had  recently  relocated  its 
data  center  from  offices  in  the  Presidio  of  San 
Francisco.  In  fact,  IT  managers  had  just  finished 
moving  the  last  racks  of  servers  into  the  new 
location  two  weeks  prior  to  our  visit  in  October. 

Much  of  the  Internet  Archive's  success  has  to  do 
with  the  way  its  IT  managers  approach  the 
storage  of  large  amounts  of  data,  said  Brewster 
Kahle,  digital  librarian  and  founder  of  the  Internet 
Archive. 

"We  are  a  petabyte-oriented  facility,  and  the 
question  is,  How  do  we  work  and  store  petabytes 
of  information  that  are  constantly  accessible  to 
the  outside  world?"  said  Kahle,  during  eWEEK 
Labs'  visit.  "The  answer  is  to  have  two  practical 
considerations— how  to  store  this  massive 
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amount  of  data  and  how  to  presen/e  it.  Preservation  and  access  are  part  of  our  mandate." 

The  Internet  Archive  is  a  nonprofit  organization  founded  in  1996  with  the  purpose  of  building  an  online  library  made 
up  of  saved  Web  sites.  The  Internet  Archive  today  includes  all  manner  of  digital  formats,  including  text,  audio  and 
video,  as  well  as  archived  Web  pages.  The  collection— which  can  be  accessed  at  www.archive.org— is 
continually  growing. 

Funding  for  the  Internet  Archive  came  originally  from  Kahle  as  a  result  of  the  sale  of 
his  company,  WAIS  (Wide  Area  Information  Servers),  to  America  Online.  The 
Internet  Archive  is  now  funded  by  private  foundations,  government  grants  and  in-kind 
donations  from  corporations. 

In  the  beginning,  the  Internet  Archive  used  Storage  Technology's  StorageTek 

TimberWolf  9710  tape  library  with  Quantum's  DLT700  drives,  the  combination  of  c-     ^   ■ 

which  could  store  as  much  as  70GB  of  data.  (Storage  Technology  was  acquired  by  **^*,  .f*")  ^*«'»'  Raport; 

Sun  Microsystems  in  2005.)  However,  while  the  tape  library  was  cost-efficient,  the       ^'"^'^  '■'^^  ^^  iAa?age 

disadvantage  was  its  relatively  slow  access  speed. 

:':  V;.  To  read  a  Q&A  with  Internet  Archive's  Brewster  Kahle,  click  here. 
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In  2000,  Internet  Archive  IT  managers  decided  to  switch  from  the  StorageTek  tape  library  to  desktop  machines 
from  Hewlett-Packard.  The  desktops,  each  of  which  had  four  160GB  disk  drives,  sat  on  standard  baker's  racks 
purchased  from  Costco  Wholesale. 

As  the  digital  library  grew,  Internet  Archive  IT  staffers  began  looking  for  cheaper  ways  to  store  data.  In  2004,  they 
developed  a  storage  system  called  the  PetaBox,  which  uses  a  combination  of  affordable  standards-based  parts  and 
open-source  software.  The  PetaBox  also  boasts  low  power  consumption.  The  Internet  Archive  eventually  spun  off  a 
company,  Capricorn  Technologies,  to  manufacture  and  sell  the  PetaBox  technology. 

Today,  the  Internet  Archive  has  about  2,000  PetaBox  systems  in  its  data  center.  The  PetaBoxes  are  used  to  crawl 
the  Internet  and  to  store  Web  pages  and  other  digital  content.  Each  of  50  racks  houses  40  lU  (1.75-inch)  PetaBox 
servers,  most  of  which  are  armed  with  dual-core  Opteron  processors  from  Advanced  Micro  Devices.  (Older 
PetaBoxes  use  ultra-low-voltage  processors  from  Via  Technologies.) 

Kahle  said  this  approach  helps  keep  costs  down  for  the  nonprofit  organization.  "We 
are  built  out  of  boxes  just  stacked  up  and  used  for  different  purposes,"  Kahle  said. 
"As  a  nonprofit,  one  of  the  biggest  [cost]  issues  for  us  is  in  the  building  of  the  data 
center— the  administration  and  the  power.  We're  trying  to  keep  all  of  these  factors 
under  control." 

^ttMtivy^     c  .^  i»^  «4   PetaBox  systems  currently  being  installed  each  have  four  750GB  perpendicular  hard 
^r  «^"^^  special  aepttrtj   ^^-^^^^  f^^^  Seagate  Technology,  providing  up  to  120TB  of  storage  per  rack.  The 
biatem  in«  internet  Internet  Archive  adds  about  one  new  rack  of  PetaBoxes  per  month,  according  to  John 

Berry,  vice  president  of  operations  at  the  Internet  Archive.  Berry  said  he  expects  this  trend  to  continue  indefinitely. 

Potential  for  Failure 

With  somewhere  between  8,000  and  9,000  disks  currently  spinning  in  all  these  systems,  disk  failure  is 
common— v^ith  2  to  3  percent  of  disks  falling  every  year.  There  is  no  way  to  hot-swap  the  drives  in  the  PetaBoxes, 
so  servers  with  failed  disks  need  to  be  pulled  out  of  their  respective  racks.  Kahle  said  this  practice  is  tolerable  at 
the  Internet  Archive  because  data  isn't  updated  as  quickly  as  it  would  need  to  be  when  dealing  with  mission-critical 
enterprise  data. 

The  Internet  Archive,  which  has  the  equivalent  of  three  full-time  system  administrators,  uses  Nagios,  an 
enterprise-class  open-source  network  monitoring  application.  Nagios  monitors  the  status  of  more  than  16,000 
checks  that  run  on  the  800  machines  that  make  up  the  Internet  Archive's  primary  cluster. 

Nagios  isn't  the  only  open-source  application  used  at  the  Internet  Archive.  The  PetaBoxes  run  Canonical's  Ubuntu 
distribution  of  Linux. 

The  Internet  Archive  also  makes  use  of  two  applications  for  the  PetaBoxes:  PetaBox  Catalog  manages  thousands  of 
tasks  running  across  the  cluster,  balancing  workloads  and  tracking  job  progress,  and  PetaBox  Control  Panel 
provides  a  Web  interface  for  configuration  and  modification  at  the  cluster,  rock,  node  and  partition  levels. 

To  Protect  and  Serve 

To  protect  data,  the  Internet  Archive's  IT  managers  tried  RAID  5.  However,  they  found  it  unable  to  scale  and  opted 
instead  to  use  a  JBOD  (just  a  bunch  of  disks)  configuration.  For  its  archive,  the  organization  uses  pairs  of  machines 
and  has  two  copies  of  everything  on  separate  machines.  The  Internet  Archive  also  has  copies  of  all  its  data  stored  in 
other  locations,  including  a  data  center  in  Amsterdam,  The  Netherlands,  and  the  new  Library  of  Alexandria,  in 
Egypt. 

"If  there's  one  lesson  we  can  take  from  the  [destruction  of  the  original]  Library  of  Alexandria,  it's  don't  have  just  one 
copy,"  Kahle  said.  "We  wanted  to  build  the  Internet  Archive  to  ensure  that  we  don't  lose  the  great  works  of  today. 
The  only  way  we  could  do  that  is  to  have  multiple  copies  and  have  multiple  places  in  the  world  that  we  synchronize 
over  the  Internet," 

The  Internet  Archive  uses  the  Internet  to  keep  its  computing  clusters  in  sync  with  one  another.  A  protocol  called 
OAI  (Open  Archives  Initiative)  is  used  for  metadata  harvesting.  HTTP  and  FTP  are  also  used  to  move  batches  of 
files. 

Despite  the  massive  amounts  of  data  that  the  Internet  Archive  is  storing,  managing  and  preserving  for  posterity, 
Kahle  said  the  secret  to  the  organization's  success  is  keeping  it  simple. 

"We  don't  do  anything  that  isn't  immediately  obvious  to  college  students  with  Linux  on  their  dorm-room  desktop," 
Kahle  said.  "We  are  allergic  to  secret  sauce.  Everything  we  do  is  standardized  and  simple." 

Senior  Writer  Anne  Chen  can  be  reached  at  anne_chen@ziffdavis.com. 

••■i'v  ■  Check  out  eWEEK.com's  Infrastructure  Center  for  the  latest  news,  views  and  analysis  on  servers, 
VW  switches  and  networking  protocols  for  the  enterprise  and  small  businesses. 
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Microsoft  releasing  book  search  in  beta 

By  Candace  Lombardi 
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Microsoft  is  releasing  Live  Search  Books,  its  competitor  to  Google  Book  Search,  in  beta  on 
Wednesday. 

The  book  search  engine  performs  keyword  searches  for  books  that  have  been  scanned  as  part  of  Microsoft's  book 
scanning  project,  in  the  same  way  that  Windows  Live  Search  searches  the  Internet,  said  Danielle  Tiedt,  the  general 
manager  of  Live  Search  Selection  for  Microsoft. 

Initially,  the  Live  Search  Books  database  will  be  searchable  from  the  book  search 
engine's  beta  home  page,  or  as  a  category  on  the  main  Windows  Live  Search 
page-a  method  referred  to  as  vertical  search.  Once  the  tool  is  out  of  beta, 
Microsoft  plans  to  incorporate  all  of  the  scanned  publications  into  its  general 
Internet  search  engine.  The  company  hopes  to  do  this  in  the  next  six  months, 
according  to  Tiedt. 
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"As  we  move  out  of  beta,  what  you  will  see  is  that  book  content  integrated  with  the 
Web  content  (search  results  on  Windows  Live  Search).  What  we  are  focusing  more 
of  our  efforts  on  for  live  searching  is  integrating  all  of  those  content  types  together  to  give  you  the  most  relevant 
results.  Sometimes  the  most  relevant  will  be  from  books.  If  for  example,  it's  a  search  on  historical  content,  chances 
are  the  most  authoritative  content  may  be  found  in  a  book,"  said  Tiedt. 

Live  Search  Books'  "Search  inside  a  book"  feature  also  allows  users  to  search  the  full  texts  of  scanned  books. 
Microsoft  has  restricted  the  beta  release  of  Live  Search  Books  to  only  include  noncopyright  books  scanned  from  the 
collections  of  the  British  Library,  the  University  of  California  and  the  University  of  Toronto. 

The  company  plans  to  add  books  currently  being  scanned  by  robotic  machines  from  the  New  York  Public  Library, 
Cornell  University  and  the  American  Museum  of  Veterinary  Medicine  within  the  next  month.  In  a  later  release,  Microsoft 
will  also  be  adding  copyright  works  that  publishers  have  given  permission  to  include  in  the  scanning  project. 

All  of  the  books  in  the  Live  Search  Books  database  will  offer  full  text  views,  according  to  Tiedt. 

"We've  focused  on  making  the  search  experience  really  impactful... Since  we  are  (only  scanning  public  domain  or 
authorized  works)  for  all  of  the  books,  people  will  have  full  access  to  all  of  the  text.  This  will  make  the 
search-inside-of-a-book  feature  easy  to  use  and  customer  friendly,"  said  Tiedt. 

Microsoft's  new  tool  is  similar  in  nature  to  Google  Book  Search  in  that  it  also  allows  full  texts  of  public  domain  works  to 
be  viewed,  searched  or  printed.  Like  Google,  Microsoft  has  chosen  to  use  PDF  files  for  the  full  text  downloads  of 
books. 

Microsoft  has  restricted  its  book  scanning  project  to  noncopyright  books,  with  publishers  having  the  option  to  opt-in,  if 
they  want  in-copyright  publications  to  be  scanned  for  the  project. 

"We  feel  very  strongly  about  copyright.  All  the  library  scanning  we  do  is  (noncopyright)  stuff,  and  then  we  work  with 
publishers  to  produce  (copyright)  stuff.  We  don't  do  any  mass  scanning  of  in-copyright  works,"  said  Tiedt. 

The  policy  contrasts  with  that  of  Google,  which  has  been  scanning  all  the  books  from  participating  libraries,  but  only 
making  public  domain  books  available  for  full  text  views.  That  has  not  stopped  several  authors  and  publishers,  both  in 
the  U.S.  and  abroad,  from  filing  suit. 
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As  part  of  its  defense  in  the  U.S.  lawsuit  filed  by  The  Authors  Guild,  Google  has  subpoenaed  several  other  companies 
that  have  book  scan  projects,  including  Microsoft,  Yahoo  and  Annazon.  While  Annazon  and  Yahoo  have  issued 
objections  to  the  subpoena,  Microsoft  has  not  yet  issued  a  formal  response,  and  v\/ould  not  respond  to  repeated 
requests  for  comment  on  the  matter. 

"Microsoft  is  not  issuing  an  official  statement  around  the  subpoena  issue,"  a  spokesman  for  Microsoft  said  in  an 
e-mail. 

Microsoft  also  plans  to  announce  on  Wednesday  the  addition  of  medical  content  to  its  Windows  Live  Academic  Search, 
an  engine  that  searches  full  texts  of  journals  in  conjunction  with  institutions'  subscriptions  to  them.  The  addition  of 
medicine  as  a  category  will  "practically  quadruple"  the  amount  of  available  searchable  content,  according  to  Tiedt. 

Copyright  ©1995-2006  CNET  Networks,  Inc.  All  rights  reserved. 
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Publish  And  Perish 

Elisabeth  Eaves,  12.01  06, 12:00  ?■>/!  ET 

Nothing  is  safe.  Not  your  e-mails,  digital  photos  or  Word  files.  Not  old  newspapers  or  books.  When  it  comes  to  storing  information,  everything  will 
disappear  into  digital  obsolescence  or  crumble  to  dust. 

Even  White  House  e-mails,  important  blueprints  and  influential  wori<s  of  20th-century  literature-the  very  artifacts  that  you'd  expect  would  be  carefully 
preserved~are  at  risk  of  being  lost  forever. 

The  National  Archives  and  Records  Administration,  the  agency  responsible  for  preserving  the  federal  govemment's  documents,  realized  in  the  1 990s 
that  it  couldn't  cope  with  the  digital  era  using  its  old  electronic  storage  system  of  magnetic  tapes.  The  White  House  under  the  Bush  Administration 
alone  will  generate  as  many  as  100  million  e-mails.  Copying  them  would  take  years.  NARA  has  contracted  Lockheed  Martin  to  build  a  federal  digital 
archive,  but  the  system  won't  be  ready  until  at  least  September  of  2007. 

The  Library  of  Congress,  meanwhile,  ditched  most  of  its  original  newspaper  collection  after  transferring  the  content  to  miaofonn,  which  uses  a 
machine  to  read  film.  But  Nicholson  Baker,  in  his  book  Double  Fold:  Libraries  and  the  Assault  on  Paper,  says  the  medium  is  at  least  as  iffy  as  paper: 
Some  eariy  acetate  films  "shrink,  buckle,  bubble  or  stick  together  in  a  solid  illegible  lump,"  he  writes,  in  the  '80s  libraries  switched  to  polyester-based 
films.  But  some  types  of  polyester  films  are  prone  to  spots,  others  attract  fungus  and  another  suffered  "complete  image  loss"  when  exposed  to  the  high 
heat  of  common  microform  readers. 

Books  aren't  safe  either.  Librarians  say  that  most  wort<s  published  between  the  mid-1 800s  and  the  mid-1980s  are  disintegrating  thanks  to  the  high  acid 
content  in  their  paper.  A  rescue  treatment  known  as  mass  deacidification  is  commercially  available,  mainly  from  Pennsylvania-based  Preservation 
Technologies.  But  it's  expensive,  says  Thomas  Teper,  head  of  preservation  at  the  University  of  Illinois  library.  As  a  result,  he  says,  "no  U.S.  library 
has  been  deacified  completely."  By  the  1980s  most  publishers  had  started  using  acid-free  paper,  at  least  for  hardcover  editions.  But  Dianne  van  der 
Reyden,  head  of  preservation  at  the  Library  of  Congress,  has  a  new  worry  over  the  rising  use  of  recycled  paper.  "Every  time  it's  recycled,  it  becomes 
weaker,"  she  says. 

The  dream  of  preserving  all  human  knowledge  is  an  ancient  one,  dating  back  at  least  to  the  Library  of  Alexandria,  which  began  assembling  papyms 
scrolls  circa  300  BC.  But  Alexandria  burned  down,  and  as  knowledge  grew  exponentially,  the  possibility  of  uniting  it  once  again  grew  more 
distant-until  the  advent  of  computers,  when  our  capacity  to  store  words  and  images  suddenly  became  vast. 

While  digital  technology  promised  huge  amounts  of  virtual  warehouse  space,  though,  our  data  are  not  all  safe  and  accessible.  Some  computer 
scientists  have  dubbed  this  era  a  "digital  dari<  age"  because  we  may  end  up  with  no  record  of  it.  Part  of  the  problem  is  the  breakneck  pace  of 
technologicalchange,  which  results  in  alarming  cases  of  obsolescence.  Several  years  ago,  U.S.  Navy  engineers  noticed  that  diagrams  of  the  USS 
Nimitz,  a  nuclear-powered  aircraft  earner,  had  been  subtly  transformed  by  new  software.  Over  at  NASA,  early  spaceflight  data  stored  on  digital  tape 
had  deteriorated  irreversibly  by  the  1990s.  Of  course,  untold  numbers  of  people  have  experienced  the  personal  calamity  of  losing  the  contents  of  their 
home  computers,  thanks  to  hard-drive  crashes. 

Alexander  Rose,  the  executive  director  of  the  futurist  Long  Now  Foundation,  worries  about  the  impermanence  of  digital  information.  "If  you  save  that 
computer  for  100  years,  will  the  electrical  plugs  look  the  same?"  he  asks.  "The  Mac  or  the  PC-will  they  be  around?  If  they  are,  what  about  the 
software? "  So  far  there's  no  business  case  for  digital  preservation-in  fact,  for  software  makers  like  Microsoft,  planned  obsolescence  is  the  plan. 

"The  reality  is  that  it's  in  companies'  interest  that  software  should  become  obsolete  and  that  you  should  have  to  buy  every  upgrade,"  Rose  says.  We 
could  be  on  the  cusp  of  a  tuming  point,  though,  in  the  way  businesses  and  their  customers  think  about  digital  preservation.  "Things  will  start  to  change 
when  people  start  losing  all  of  their  personal  photos,"  Rose  said. 

So  what,  if  anything,  can  be  done?  In  the  short  term,  at  least,  open-source  software  and  nonproprietary  file  formats-like  txt,  xml  and  .html-give  you 
the  greatest  chance  of  migrating  your  documents  forward  as  technology  changes.  As  for  the  historical  record,  the  Internet  Archive,  a  nonprofit 
organization  that  collaborates  with  the  Library  of  Congress  and  the  Smithsonian,  is  going  some  of  the  distance  to  save  us  from  a  dari<  age.  It  captures 
Web  pages  before  they  disappear  and  stores  them  in  its  searchable  Wayback  Machine.  Co-founder  Brewster  Kahle  says  that  if  no  one  recorded  all  the 
material  originating  online,  "we'd  live  in  the  perpetual  present,  in  which  any  organization  could  change  history  by  taking  down  the  Web  page." 

Gathering  information  is  one  thing,  saving  it  another.  To  keep  its  digital  files  accessible,  the  Internet  Archive  has  to  move  them  to  a  new  system  every 
three  years,  Kahle  says,  and  the  organization  is  beginning  large-scale  data  swaps  with  foreign  libraries.  "The  real  answer  for  digital  preservation  is 
diligence  and  don't  just  have  one  copy,"  he  says.  "You  can  be  faced  with  institutional  instability,  government  instability,  geographic  instability." 

Massive  book-scanning  projects  like  the  one  launched  by  Google  may  help  preserve  literature  by  making  it  more  accessible.  "The  broader  the  access 
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to  any  resource,  the  more  likely  It  Is  to  survive,"  says  Rose.  That's  because  someone  is  more  likely  to  notice,  and  raise  the  alarm,  when  a  format 
stops  working.  On  the  other  hand,  "as  the  digitization  projects  proceed,  the  desire  for  universities  to  hold  onto  their  physical  books  may  decrease,"  says 
Kahle.  Space-strapped  libraries  could  decide  to  send  old  books  to  the  dump,  just  as  they  have  done  with  hundreds  of  thousands  of  historic  newspapers. 
And  scanned  books  are  as  vulnerable  to  technological  change  and  obsolescence  as  other  digital  formats. 

Can  anything  last  forever?  The  Long  Now  Foundation  is  micro-etching  its  1 5,000-page  Rosetla  Project,  an  archive  of  data  on  human  languages,  onto  a 
3-inch  metal  disk  it  hopes  will  last  at  least  1 0,000  years.  But  we  still  may  not  have  improved  on  4,000-year-old  technology.  Asked  what  the  most 
permanent  medium  is,  Kahle  doesn't  miss  a  beat:  "The  clay  tablets  of  the  Babylonians.  Their  libraries  are  readable  to  us  today." 

Click  here  for  more  on  our  special  report  on  books. 
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By  Candace  Lombardi 
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The  nonprofit  Internet  Archive  announced  Wednesday  it  has  received  $1  million  from  the  Alfred  P. 
Sloan  Foundation  to  continue  its  effort  to  scan  public  domain  works  for  open  online  accessibility. 

The  archiving  organization's  Open-Access  Text  Archive  is  an  open-source  alternative  to  book-scanning  efforts  like  the 
ones  from  Google  and  Microsoft.  Internet  Archive,  perhaps  best  known  for  its  WayBack  Machine  archive  of  Web  pages 
by  date-is  also  an  online  digital  library  of  text,  audio,  software,  images  and  video  content. 

"Brewster  Kahle  and  the  Internet  Archive  are  pioneers  in  this  exciting  and  historic  opportunity  to  create  a  universal 
digital  library  that  is  both  open-access  and  non-proprietary,"  said  Doron  Weber,  who  overseas  public  understanding  of 
science  and  technology  at  the  Sloan  Foundation,  in  a  statement. 

Kahle  was  one  of  the  inventors  of  Wide  Area  Information  Servers  (WAIS),  a  text-based  search  system  that  searched 
database  indexes  on  remote  servers  before  there  were  Internet  search  engines.  After  WAIS  was  sold  to  AOL  in  1995 
for  several  million  dollars,  Kahle  founded  the  Internet  Archive,  which  works  closely  with  the  Open  Content  Alliance 
(OCA).  The  OCA  developed  a  set  of  principles  dedicated  to  a  "permanent  archive  of  multilingual  digitized  text  and 
multimedia  content"  for  free  and  open  access. 

The  grant  from  the  Sloan  charitable  trust  will  enable  Internet  Archive  and  the  OCA  to  scan  collections  from  several 
major  institutions,  including  the  entire  collection  of  publications  from  the  Metropolitan  Museum  of  Art  as  well  as  several 
thousand  images  from  the  museum;  John  Adams'  personal  library  of  over  3,800  works  at  the  Boston  Public  Library; 
and  other  collections  from  The  Getty  Research  Institute,  Johns  Hopkins  University  and  the  University  of  California, 
Berkeley. 

The  announcement  comes  just  after  the  San  Francisco-based  Internet  Archive  reached  the  milestone  of  scanning 
100,000  books.  That  may  not  sound  like  a  lot  compared  to  Google  Book  Search's  claim  of  millions  within  a  decade,  but 
the  OCA  has  ramped  up  its  scanning  recently  to  about  12,000  books  a  month.  According  to  its  own  statistics,  the 
organization  has  also  archived  65  billion  pages  from  50  million  Web  sites. 

"Google  is  so  good  at  the  media  being  their  PR  machine,  that  you  would  not  know  there  was  an  alternative  out  there," 
Kahle  said.  "We  have  brand  name  institutions  going  open  and  foundations  like  the  Sloan  are  funding  (us).  It  shows  that 
the  Open  Content  Alliance  is  viable,  that  there  is  support  for  public  interest.  We  don't  have  to  phvatize  the  library 
system." 

Google  has  begun  to  offer  full-text,  printable  PDFs  of  public  domain  works  with  plans  to  add  more  as  it  scans  more 
books.  But  its  platform  is  closed,  and  its  PDF  pages  have  a  "Digitized  by  Google"  watermark.  The  company  is  not 
planning  to  share  its  scanned  material  with  the  OCA  or  Internet  Archive,  according  to  Kahle. 

"We  think  they  (Google)  are  doing  great  stuff.  If  the  materials  would  be  made  available  for  broad  public  search  and 
educational  use  we'd  be  all  for  it,  but  in  my  discussion  with  the  founders  (Google  co-founders  Larry  Page  and  Sergey 
Brin)  they  aren't  going  to,"  said  Kahle. 

Google  did  not  respond  to  requests  for  comment  about  its  book  scanning  project. 

,  ...  ..  Google  scans  and  indexes  both  public  domain  and  copyright  works,  an  issue  that 

It  Shows  that  the  has  raised  legal  concerns.  The  Google  Book  Search  engine  restricts  full  access  to 

Open  Content  Alliance    copyright  works  while  still  offering  snippet  views,  instead  of  excluding  the  work  from 
its  search  feature  altogether,  according  to  the  Google  Book  Search  Web  site. 
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is  viable,  that  there  is       "This  whole  Google  Book  Search  looks  like  Amazon's  Search  Inside  the  Book,"  said 
support  for  public  Kahle.  "Let's  go  open  with  these  collections. ..These  are  beautiful  books." 

interest.  We  don't  Yahoo  is  a  supporter  of  the  OCA  and  has  helped  the  OCA  index  some  of  the 

have  to  privatize  the         scanned  content,  but  its  project  is  smaller  than  those  of  Google  and  Microsoft, 
library  system."  according  to  Gregory  Crane,  a  classics  professor  and  digital  library  expert  at  Tufts 

--Brewster  Kahle,  Internet    University. 
Archive  founder 

Microsoft  was  an  early  supporter  of  the  OCA  and  in  June  worked  with  it  on  a  project 
scanning  and  indexing  materials  from  the  University  of  California  and  the  University  of  Toronto  libraries  as  part  of  its 
Windows  Live  Book  Search  project.  But  Microsoft  has  become  more  proprietary  in  recent  months,  Kahle  said. 

"We  continue  to  work  with  Microsoft,  but  the  results  going  forward  are  not  strictly  OCA  principles,"  Kahle  later  added  in 
an  e-mail.  "To  their  credit,  they  are  interested  in  helping  get  more  scanning  done  in  the  open,  of  course  because  they 
can  use  the  books  as  well,  but  still,  this  is  more  than  other  projects. 

Jay  Girotto,  who  heads  Microsoft's  Live  Book  Search  selection  team,  further  explained  his  company's  position. 

"We  support  the  fundamental  mission  of  the  OCA,  and  hope  that  many  more  partners  like  the  Sloan  Foundation  will 
step  forward  and  contribute  significant  resources  to  scan  public-domain  materials  under  the  OCA  principles,"  he  said  in 
a  statement. 

Research  impacts 

Tufts'  Crane  thinks  the  companies  are  reluctant  to  share  for  fear  of  helping  the  competition. 

"My  impression  is  that  both  Microsoft  and  Google  don't  want  the  other  benefiting  from  their  investment,  he  wrote  in  an 
e-mail.  "Now  each  is  hoarding.  Ideally,  each  would  split  the  cost  of  digitizing  content  and  then  make  the  public  domain 
material  available  in  the  OCA.  At  the  moment,  Google  is  well  ahead,  and  1  would  think  that  they  would  feel  that 
Microsoft  would  benefit  too  much." 

A  lack  of  open-source  access.  Crane  explained,  impedes  research  that  requires  access  to  multiple  groups  of  works  in 
bulk,  and  prevents  researchers  from  applying  more  nuanced  OCR  (optical  character  recognition)  searches  to  those 
texts. 

"We  are  evaluating  OCR  on  classical  Greek.  Google  runs  OCR  on  all  its  texts-that's  how  it  generates  searchable  OCR. 
The  Google  OCR,  though,  doesn't  know  Greek  and  produces  no  usable  text  as  far  as  we  can  tell.  Google  says  that 
you  have  to  get  permission  to  run  OCR,  etc.. .on  its  PDF  books,"  Crane  said,  further  explaining,  "Even  if  the  PDF  books 
are  good  enough  quality  to  support  OCR~they  might  be  lower  than  the  archival  resolution. 

"I  am  sure  that  Google  would  be  open  to  us  doing  this  work,  but  that  means  (for  each  academic  project)  getting  their 
attention,  writing  letters,  and  a  lot  of  hassle,"  Crane  said.  "I  think  it's  easier  and  better  in  the  long  run  to  open  the 
library  up  and  let  the  world  have  at  it,"  he  said. 

Copyright  ©1995-2006  CNET  Netv\/orks,  Inc.  All  rights  reserved. 
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1  tensions  revolve  around  Google's  insistence  on  chaining  the  digital  content  to  its  li 


t-leading  search  engine  and  the  nine  major  libraries  that  have  aligned  themselves  with  the  Mountain  \ 


approach  to  prevent  mankind's  accumulated  knowledge  from  being  controlled  b 
keep  it  open  and  certainly  don't  want  any  company  to  endose  it,"  said  Doron  V 


I  entity,  even  if  it  is  a  company  like  Google  that 
,  program  director  of  public  understanding  of  science  and 


i  Archive,  a  leader  in  the  Open  Content  Alliance, 


)  pay  for  digital  copies  cf 


d  by  the  Boston  Public 


s  to  be  scanned  include  the  personal  library  of  John  Adams,  America's  second  president,  and  thousands  of  images  from  the  Metropolitan  Museum 

I  grant  also  will  be  used  to  scan  a  collection  of  anti-slavery  material  provided  by  the  John  Hopkins  University  Libraries  and  documents  about  the  Gold  Rush  from  a  library  at  the  University  of  California  at 


The  deal  represents  a  coup  for  Internet  Archive  founder  Brewster  Kahle,  a  strident  c 
They  doni  want  the  books  to  appear  in  anyone  else's  search  engine  but  their  own, 


>  and  publishers  nevertheless  h 


f  the  controls  that  Google  has  imposed  on  its  book-scanning  initis 
)  is  a  little  peculiar  for  a  company  that  says  its  mission  is  to  mak« 
o  scan  copyrighted  material  without  explicit  pennission.  Google  \ 


ally  accessible,"  Kahle  said 

'  small  excerpts  from  the  copynghted 


.fUS 


I  Google  for  copyright  infringement  in  a  year-old  case  t 


>  slowly  wending  its  way  through  federal  court 

.  Most  of  the  roughly  100,000  books  that  the  e 


le  also  IS  footing  a  bill  e}^ected  to  run  in  the 
its  digital  copies,  other  search  engines  are  being  encouraged  to  index  the  material  t 


s  of  millions  to  make  the  digital  copies  -  a  commitment 


Although  the  Open  Content  Alliance  depends  on  the  Internet  Archive  t 

Yahoo  and  Microsoft 

Both  Yahoo  and  Microsoft,  which  run  the  two  largest  search  engines  behind  Google,  belong  to  the  alliance.  The  group  has  more  than  60  members,  consisting  mostly  of  li 

None  of  Google's  contracts  prevent  participating  libraries  from  making  separate  scanning  arrangements  with  other  organizations,  said  company  spokeswoman  Megan  Lamb 

'We  encourage  the  digitization  of  more  books  by  more  organizations,"  Lamb  satd  "It's  good  for  readers,  publishers,  authors  and  libranes  " 

ock  its  search  engine  with  unique  material  to  give  people  more  reasons  to  visit  its  Web  s 


ays  he  was  disappointed  by 


mams  more  worried  about  Google's  book-scanning  initiative  because  it  has  gathered  so  much  attention  and  support 

;  part  of  universities  They  are:  Harvard.  Stanford,  Michigan.  Oxford.  California,  Virginia,  Wisconsin-Madison,  and  Complutense  of  Madnd  The 

!  we  could  get  more  from  being  a 

cun-ently  studying  the 

I  Greek  If  the  same  problem  were  to  crop  up  with  a  digital  book  in  the  Open  Content  Alliance.  Crane  think  it  will  be  more 


Google  'Vnay  end  up  aiming  for  the  lowest  common  denominator  and  not  be  able  to  do  anything  really  de^"  with  the  digital  books,  Crane  s 
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We  worry  so  much  about  cUck-through 
rates,  ad  campaigns,  and  keyword  pricing 
that  it's  easy  to  miss  out  on  the  more 
creative  side  of  what  the  Internet  can  offer. 

Thanks  to  a  grant  the  Internet  Archi\'e  received 

from  the  Alfred  P.  Sloan  Foundation,  Brewster  Kahle  and  company  will  be  able  to  bnng  a  number  of 

historic  collections  of  works  into  the  digital  realm. 

Once  in  place,  these  works  will  be  available  without  restnctions  on  their  use.  It's  a  scenario  that  Kahle 
has  worked  hard  to  encourage  through  his  work  on  the  Open  Content  Alliance,  which  he  discussed  in 
a  phone  call  with  us. 

The  million-dollar  grant  will  enable  works  from  five  different  collections  to  be  made  available  digitally. 
In  discussing  the  scanning  of  works  and  repurposing  of  existing  digital  content  (the  Metropolitan 
Museum  of  Art  wants  OCA  links  to  come  to  its  hi-res  images),  Kahle  touched  on  the  topic  of  Google's 
book  scanning  endeavors. 

He's  not  a  fan  of  Google's  agreements  with  libraries  in  exchange  for  their  content,  and  he  claimed 
others  were  beginning  to  look  askance  at  Google's  requests.  "Some  people  are  reading  those 
agreements,"  Kahle  said.  "Libraries  don't  want  a  perpetual  lockdown  of  their  content." 

If  Google  should  decide  to  embrace  OCA's  methods,  they  would  be  welcomed  with  open  arms.  One  of 
Kahle's  goals  has  been  to  bring  the  search  advertising  giant  into  the  fold. 

Until  then,  Kahle  and  OCA  will  be  delighted  to  put  Sloan's  grant  to  good  use.  Along  with  the  archive 
of  publications  and  several  thousand  key  images  from  the  Met,  the  OCA  will  gain  access  to  collections 
spanning  the  country: 

Boston  Public  Library:  The  John  Adams  collection,  which  is  the  complete  personal  library  of  the 
Founding  Father,  lifelong  book  collector  and  second  President  of  the  United  States. 

The  Getty  Research  Institute:  Major  collection  of  books  on  art  and  architecture  and  an  alternate 
collection  on  the  performing  arts. 

Johns  Hopkins  University  Libraries:  The  James  Birney  Collection  of  Anti-Slavery  materials. 
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Bancroft  Library  of  the  University  of  California  at  Berkeley:  Key  primary  texts  documenting  the 
California  Gold  Rush  and  Western  expansion. 

Although  the  technology  side  of  scanning  occupies  an  important  place  in  OCA's  efforts,  Kahle  was 
careful  to  emphasize  the  essential  need  for  the  human  side  of  the  equation.  He  said  leveraging  librarians 
will  be  very  important.  Their  skills  at  assembling  sensible  collections  and  cataloging  them  will  make 
works  much  more  accessible  and  useful  to  the  public. 

Older  material  poses  a  challenge,  Kahle  said;  he  also  recommended  a  resource  for  some  classical 
material  placed  online  at  Tufts  University.  There,  Professor  Gre^orv  Crane  oversees  the  Perseus 
Project. 

It  is  an  effort  to  mashup  places  mentioned  throughout  classical  literature  with  online  maps  and  other 
digital  resources  to  bring  people  a  look  at  the  geography  settings  for  historic  events.  For  example,  300, 
an  adaptation  of  Frank  Miller's  graphic  novel  of  the  same  name  about  the  Spartan  stand  at 
Thermopylae,  will  be  in  theaters  in  2007. 

The  curious  moviegoer  can  use  Perseus  to  view  Thermopylae  as  it  is  today,  through  images  collected  in 
its  database.  One  image  shows  a  view  from  a  Spartan  burial  mound  looking  south-southwest  at  the 
cliffs  above  the  Pass  of  Thermopylae. 

As  digital  archiving  efforts  like  those  of  OCA  and  Perseus  gain  in  content,  more  of  the  worid  will  be 
revealed  to  a  greater  number  of  people.  Kahle  and  Crane  likely  hope  we  will  put  that  knowledge  to 
good  use  over  time. 
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The  qimtfor  the  universal  library. 
BY  JEFFREY  TOOBIN 


Every  weekday,  a  truck  pulls  up  to  the 
Cecil  H.  Green  Library,  on  the  cam- 
pus of  Stanford  University,  and  collects  at 
least  a  thousand  books,  which  are  taken  to 
an  undisclosed  location  and  sainned,  page 
by  page,  into  an  enormous  database  being 
created  by  Google.  The  company  is  also 
retrieving  books  from  libraries  at  several 
other  leading  universities,  including  Har- 
vard and  Oxford,  as  ^vell  as  the  New  York 
Public  Library.  At  the  University  of  Micli- 
igan,  Google's  original  partner  in  Google 
Book  Search,  tens  of  thousands  of  books 
are  processed  each  week  on  the  company's 
custom-made  scanning  equipment. 

Google  intends  to  scan  every  book  ever 
published,  and  to  make  the  full  texts 
searchable,  in  the  same  way  that  Web  sites 
can  be  searched  on  die  company's  engine 
at  google.com.  At  the  books  site,  which  is 
up  and  running  in  a  beta  (or  testing)  ver- 
sion, at  books.google.com,  you  can  enter  a 
word  or  phrase — say,  Ahab  and  whale — 
and  the  search  returns  a  list  of  works  in 
which  the  terms  appear,  in  this  case  nearly 
eight  hundred  titles,  including  numer- 
ous editions  of  Herman  Melville's  novel. 
Clicking  on  "Moby-Dick,  or  The  Whale" 
calls  up  Chapter  28,  in  which  Ahab  is 
introduced.  You  can  scroll  through  the 
cliapter,  search  for  other  terms  that  appear 
in  the  book,  and  compare  it  with  other 
editions.  Google  won't  say  how  many 
books  are  in  its  database,  but  the  site's 
value  as  a  researcli  tool  is  apparent;  on  it 
you  can  find  a  liistory  of  Urdu  newspapers, 
an  1892  edition  of  Jane  Austen's  letters, 
several  guides  to  writing  haiku,  and  a  Har- 
vard alumni  directory  from  1919. 

No  one  really  knows  how  many  books 
there  are.  The  most  volumes  listed  in  any 
catalogue  is  thirty-two  million,  the  num- 
ber in  WorldCat,  a  database  of  tides  from 
more  than  twenty-five  thousand  libraries 
around  the  world.  Google  aims  to  scan  at 
least  that  many.  "We  think  that  we  can  do 
it  all  inside  often  years,"  Marissa  Mayer, 
a  vice-president  at  Google  who  is  in 
charge  of  the  books  project,  said  recently. 


at  the  company's  headquarters,  in  Moun- 
tain View,  California.  "It's  mind-boggling 
to  me,  how  dose  it  is.  I  think  of  Google 
Books  as  our  moon  shot." 

Google's  is  not  the  only  book-scan- 
ning venture.  Amazon  has  digitized  hun- 
dreds of  thousands  of  the  books  it  sells, 
and  allows  users  to  search  the  texts;  Carne- 
gie Mellon  is  hosting  a  project  called 
the  Universal  Library,  which  so  far  has 
scanned  nearly  a  million  and  a  half  books; 
the  Open  Content  Alliance,  a  consortium 
that  includes  Microsoft,  Yalioo,  and  sev- 
eral major  libraries,  is  also  scanning  thou- 
sands of  books;  and  there  are  many  smaller 
projects  in  various  stages  of  development. 
Still,  only  Google  has  embarked  on  a 
project  of  a  scale  commensurate  with  its 
corporate  philosophy:  "to  organize  the 
world's  information  and  make  it  univer- 
sally accessible  and  usefiil." 

In  part  because  of  that  ambition,  Goo- 
gle's endeavor  is  encountering  opposition. 
A  federal  court  in  New  York  is  consider- 
ing two  challenges  to  the  project,  one 
brought  by  several  writers  and  the  Au- 
thors Guild,  the  other  by  a  group  of  pub- 
lishers, who  are  also,  curiously,  partners  in 
Google  Book  Search.  Both  sets  of 
plaintiffs  claim  that  the  library  component 
of  the  project  violates  copyright  law.  Like 
most  federal  lawsuits,  these  cases  appear 
likely  to  be  settied  before  they  go  to  trial, 
and  the  terms  of  any  such  deal  vWll  shape 
the  future  of  digital  books.  Google,  in  an 
effort  to  put  the  lawsuits  behind  it,  may 
agree  to  pay  the  plaintiffs  more  than  a 
court  would  require;  but,  by  doing  so,  the 
company  would  discourage  potential 
competitors.  To  put  it  another  way,  being 
taken  to  court  and  charged  widi  copyright 
infringement  on  a  large  scale  might  be  the 
best  thing  that  ever  happens  to  Google's 
foray  into  the  printed  word. 

Though  Google  has  more  than  ten 
thousand  employees — about  fifty 
new  ones  are  hired  each  week — and  a 
market  capitalization  of  more  than  a 
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hundred  and  fifty  billion  dollars,  the 
company  cultivates  the  air  of  a  college 
campus  at  its  headquarters,  in  SUicon 
Valley.  Now  and  then,  there  are  self- 
consciously wacky  stunts,  like  Pajama 
Day,  which  happened  to  take  place 
when  I  visited.  (The  event  was  to  be 
madcap  within  reason;  supervisors  were 
told  to  convey  the  message  that  "paja- 


computer  science  that  putting  things  on 
dead  trees  was  obsolete  and  getting  it  all 
into  a  searchable,  digital  format  was  a 
quest  that  had  to  be  accomplished  some- 
day," Terry  Winograd,  a  Stanford  pro- 
fessor who  was  a  mentor  to  Page  and 
Brin,  said. 

After  founding  Google,  in  1998,  Page 
and  Brin — ^who  are  now  in  their  mid- 


quality  knowledge  is  captured  in  books. 
So  not  having  that — it^s  just  too  big  an 
omission."  As  Marissa  Mayer  put  it, 
"Google  has  become  known  for  8;  pro- 
viding access  to  all  of  the  world's  knowl- 
edge, and  if  we  provide  access  to  books 
we  are  going  to  get  much  higher-quality 
and  much  more  reliable  information. 
We  are  moving  up  the  food  chain." 


Publishers  have  sued  Google  for  breaching  copyright.  A  settlement  seems  likely,  but  it  may  not  be  in  the  publics  interest. 


mas  means  'pajamas,'  not  'what  you 
sleep  in.' ")  When  I  met  with  Sergey 
Brin,  a  co-founder  of  Google,  he  was 
wearing  bright-blue  p.j.s,  with  the 
company's  logo  stitched  on  the  breast 
pocket. 

The  story  of  how  Brin  and  Google's 
other  co-founder,  Larry  Page,  met  as 
graduate  students  in  computer  science  at 
Stanford  in  the  mid-nineties,  and  de- 
vised a  series  of  elegant  software  algo- 
rithms that  allowed  Web  searchers  to 
find  relevant  information  quickly  and 
efficiendy,  has  become  part  of  Silicon 
Valley  lore.  Less  well  known  is  that,  at 
the  time,  Brin  and  Page  were  also  work- 
ing on  Stanford's  Digital  Library  Tech- 
nologies Project,  an  attempt,  funded  by 
the  federal  government,  to  organize 
dlfFerent  kinds  of  stored  information,  in- 
cluding books,  articles,  and  journals,  in 
digital  form.  "There  was  an  attitude  in 


thirties  and  worth  around  fourteen  bil- 
lion dollars  each — began  to  talk  about 
how  to  include  books  in  the  company's 
database.  Page,  in  particular,  embraced 
the  idea  of  putting  books  online;  at  one 
point,  he  set  up  a  primitive  lab  in  his 
office,  with  a  scanner  and  a  page-turning 
machine.  "I  think  it  was  motivating  to 
have  those  kinds  of  aspirations,  but  no- 
body really  took  it  seriously,"  Brin  told 
me.  The  men  were  less  interested  in 
making  it  easy  for  people  to  obtain  the 
fiiU  texts  of  books  online  than  in  making 
accessible  the  information  those  books 
contained.  "We  really  care  about  the 
comprehensiveness  of  a  search,"  Brin 
said.  "And  comprehensiveness  isn't  just 
about,  you  know,  total  number  of  words 
or  bytes,  or  whatnot.  But  it's  about  hav- 
ing the  really  high-quality  information. 
You  have  thousands  of  years  of  human 
knowledge,  and  probably  the  highest- 


In  2002,  Google  quiedy  made  over- 
tures to  several  libraries  at  major  uni- 
versities. The  company  proposed  to 
digitize  the  entire  collection  free  of 
charge,  and  give  the  library  an  elec- 
tronic copy  of  each  of  its  boolcs.  "Larry 
is  an  undergrad  alum  here  at  Michigan, 
and  he  knew  we  were  already  interested 
in  digitizing  the  library  as  part  of  our 
preservation  eflforts,"  John  Wilkin,  an 
associate  university  librarian  at  Michi- 
gan, told  me.  "There  was  a  lot  of  back- 
and-forth  between  Google  and  us  in 
the  process.  We  wanted  to  insure  that 
the  materials  wouldn't  be  damaged  and 
that  what  came  out  could  be  used  as 
a  preservation  surrogate.  They  started 
experimenting  with  different  ways  of 
copying  the  images,  and  we  started 
a  pilot  project  in  July,  2004.  We've 
been  getting  better,  going  faster.  We're 
doubling  our  output  all  the  time."  The 
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Michigan  library  holds  seven  million 
volumes,  and  WUkin  believes  tliat  Google 
will  have  copied  the  entire  collection  in 
about  six  years. 

Last  month,  at  the  New  York  Pub- 
lic Library,  Google  hosted  a  con- 
ference on  the  future  of  the  publish- 
ing industry.  About  four  hundred  peo- 
ple— mainly  publishing  executives  and 
agents — attended,  most  of  them  grimly 
aware  of  the  simultaneous  lethargy  and 
panic  that  have  characterized  their  indus- 
try's response  to  the  digital  age.  Nearly  all 
attempts  to  sell  books  in  an  electronic  for- 
mat have  been  disappointing,  and  now 
Google  appeared  to  be  encroaching  on 
the  pubhshers'  domain.  The  implicit 
message  of  the  conference  was  summed 
up  by  a  quotation  from  Charles  Darwin 
that  was  projected  on  a  screen:  "It  is  not 
the  strongest  of  the  species  that  sunave, 
nor  the  most  intelligent,  but  the  ones 
most  responsive  to  change."  As  Laurence 
Kirschbaum,  a  longtime  publishing  exec- 
utive who  recently  became  a  literary  agent, 
told  me  at  the  conference,  "Google  is  now 
the  gatekeeper.  They  are  reaching  an  au- 
dience that  we  as  publishers  and  authors 
are  not  reaching.  It  makes  perfect  sense  to 
use  the  specificity  of  a  search  engine  as  a 
tool  for  selling  books." 

Google  thought  so,  too,  and  designed 
the  books  project  accordingly.  In  addi- 
tion to  forming  partnerships  with  librar- 
ies, the  company  has  signed  contracts 
with  nearly  every  major  American  pub- 
Usher.  When  one  of  these  publishers' 


books  is  called  up  in  response  to  search 
queries,  Google  displays  a  portion  of  the 
total  work  and  shows  links  to  the  pub- 
lisher's Web  site  and  online  shops  like 
Amazon,  where  users  can  buy  the  book. 
"We  are  helping  the  publishers  reach 
consumers  that  otherwise  might  not 
have  known  about  their  books  and  help- 
ing them  market  their  books  by  giving 
limited  but  relevant  previews  of  the 
books,"  Jim  Gerber,  Google's  director  of 
content  partnerships,  told  me.  "The  In- 
ternet and  search  are  custom  made  for 
marketing  books.  When  there  are  a  hun- 
dred and  seventy- five  thousand  new 
books  each  year,  you  can't  market  each 
one  of  those  books  in  mass  market. 
When  someone  goes  into  a  search  en- 
gine to  learn  more  about  a  topic,  that  is  a 
perfect  time  to  make  them  aware  that  a 
given  book  exists.  Riblishers  know  that 
'browse  leads  to  buy.' "  (Google  says  that 
it  does  not  take  a  cut  of  sales  made  through 
its  books  site.) 

Still,  on  October  19,  2005,  several 
leading  publishers,  including  Simon  & 
Schuster,  the  Penguin  Group,  and  Mc- 
Graw  Hill — all  of  which  are  partners  in 
Google  Book  Search — filed  a  lawsuit 
against  the  company,  seeking  to  stop  the 
project.  The  pubUshers  don't  object  to 
Google's  plan  for  helping  them  sell  new 
books,  but  they  assert  that  the  library 
component  of  the  project  is  ille- 
gal. They  claim  that  Google's  "massive, 
wholesale  and  systematic  copying  of  en- 
tire books  stiU  protected  by  copyright" 
infringes  on  the  publishers'  rights.  They 


demand  diat  Google  stop  fijrther  copy- 
ing and  "destroy  all  unauthorized  copies 
made  by  Google  through  the  Google  Li- 
brary Project  of  any  copyrighted  works." 
(The  Authors  Guild  filed  its  lawsuit 
around  the  same  time.)  The  publishers, 
who  have  the  support  of  the  Association 
of  American  Publishers,  are  suffering 
from  a  version  of  the  problem  that  John 
Kerry  had  in  the  last  Presidential  cam- 
paign: they  are  for  Google  Book  Search 
at  the  same  time  that  they  are  against  it. 

Copyright  law  dates  to  the  birth  of  the 
Republic.  Article  I  of  the  Constitu- 
tion assigns  Congress  the  right  to  pass 
laws  "securing  for  limited  Times  to  Au- 
thors and  Inventors  the  exclusive  Right 
to  their  respective  Writings  and  Discov- 
eries." The  first  copyright  law  was  passed 
in  1790,  and  it  has  been  frequently  and 
confusingly  amended  over  the  years, 
most  recendy  in  the  Sonny  Bono  Copy- 
right Term  Extension  Act  of  1998, 
which  extended  copyright  terms  by 
twenty  years.  (The  law  is  also  known  as 
the  Mickey  Mouse  Protection  Act,  be- 
cause the  Walt  Disney  Company,  seek- 
ing to  protect  its  copyright  on  early  ani- 
mated classics  like  "Steamboat  Willie," 
lobbied  heavily  for  it.)  The  twisted  his- 
tory of  copyright  law  has  insured  an  awk- 
ward passage  into  the  digital  age. 

The  legal  assertion  at  the  core  of 
Google's  business  plan  is  its  purported 
right  to  scan  millions  of  copyrighted 
books  without  payment  to  or  permis- 
sion from  the  copyright  owners.  Ap- 
proximately twenty  per  cent  of  all  books 
are  in  the  public  domain;  these  include 
books  that  were  never  copyrighted,  like 
government  publications,  and  works 
whose  copyrights  have  expired,  like 
"Moby-Dick."  Google  has  simply  cop- 
ied such  books  and  made  them  available 
on  the  Web.  Roughly  ten  per  cent  of 
books  are  copyrighted  and  in  print — 
tliat  is,  actively  being  sold  by  publishers. 
Many  of  these  books  are  covered  by 
Google's  arrangement  with  its  publisher 
partners,  which  allows  the  company  to 
scan  and  display  parts  of  the  works. 

The  vast  majority  of  books  belong  to 
a  third  category:  still  protected  by  copy- 
right, or  of  uncertain  status,  and  out  of 
print.  These  books  are  at  the  center  of 
the  conflict  between  Google  and  the 
publishers.  Google  is  scanning  these 
books  in  fiill  but  making  only  "snippets" 


(the  company's  term)  available  on  the 
Web.  (Google  searches  turn  up  only  the 
search  term  and  about  twenty  words 
on  either  side  of  it.)  Copyright  law  has 
never  forbidden  all  "copying"  of  a  pro- 
tected work;  scholars  and  journalists 
have  long  been  allowed  to  quote  por- 
tions of  copyrighted  material  under  the 
doctrine  of  fair  use.  Google  maintains 
that  the  chunks  of  copyrighted  material 
that  it  makes  available  on  its  books  site 
are  legal  under  fair  use.  "We  really  anal- 
ogized book  search  to  Web  search,  and 
we  rely  on  fair  use  every  day  on  Web 
search,"  David  C.  Drummond,  a  senior 
vice-president  at  Google  who  is  over- 
seeing the  response  to  the  lawsuits,  told 
me.  "Web  sites  that  we  crawl  are  copy- 
righted. People  expect  their  Web  sites 
to  be  found,  and  Google  searches  find 
them.  So,  by  scanning  books,  we  give 
books  the  chance  to  be  found,  too." 
(Google  also  has  an  "opt  out"  policy, 
which  allows  copyright  holders  to  re- 
quest that  specific  titles  be  omitted  from 
the  company's  database.) 

However,  according  to  the  plaintiffs 
in  the  cases  against  Google,  the  act  of 
copying  the  complete  text  amounts  to  an 
infringement,  even  if  only  portions  are 
made  available  to  users.  'What  they  are 
doing,  of  course,  is  scanning  literally  mil- 
lions of  copyrighted  books  wathout  per- 
mission," Paul  Aiken,  the  executive  di- 
rector of  the  Authors  GuUd,  said.  "Google 
is  doing  something  that  is  likely  to  be 
very  profitable  for  them,  and  they  should 
pay  for  it.  It's  not  enough  to  say  that  it 
will  help  the  sales  of  some  books.  If  you 
make  a  movie  of  a  book,  that  may  spur 
sales,  but  that  doesn't  mean  you  don't  li- 
cense the  books.  Google  should  pay.  We 
should  be  finding  ways  to  increase  the 
value  of  the  stuff  on  the  Internet,  but 
Google  is  saying  the  value  of  the  right  to 
put  books  up  there  is  zero." 

Google  asserts  that  its  use  of  the 
copyrighted  books  is  "transformative," 
that  its  database  turns  a  book  into  es- 
sentially a  new  product.  "A  key  part  of 
the  line  between  what's  fair  use  and 
what's  not  is  transformation,"  Drummond 
said.  "Yes,  we're  making  a  copy  when 
we  digitize.  But  surely  the  ability  to  find 
something  because  a  term  appears  in  a 
book  is  not  the  same  thing  as  reading 
the  book  That's  why  Google  Books  is  a 
different  product  from  the  book  itself" 
In  other  words,  Google  says  that  being 


able  to  search  books  on  its  site — ^which  it 
describes  as  the  equivalent  of  a  giant  li- 
brary card  catalogue — is  not  the  same  as 
making  the  books  themselves  available. 
But  the  publishers  cite  another  factor  in 
fair-use  analysis:  the  amount  of  the  copy- 
righted work  that  is  used  in  the  creation 
of  the  new  one.  Google  is  copying  entire 
books,  which  doesn't  sound  "fair"  to  the 
plaintiff  publishers  and  authors.  'Tradi- 
tional copyright  analysis 
says  that  a  transformation 
leads  to  the  creation  of  a 
new  and  independent  work, 
like  a  parody  or  a  work  of 
criticism,"  Jane  Ginsburg,  a 
professor  at  Columbia  Law 
School,  said.  "Copying  the 
entire  work,  which  is  what  Google  is 
doing,  does  not  preclude  a  finding  of  fair 
use,  but  it  does  fall  outside  the  traditional 
paradigm." 

Harvard,  Stanford,  and  Oxford  have 
prohibited  Google  from  scanning  copy- 
righted works  in  their  collections,  limit- 
ing the  company  to  books  that  are  in  the 
public  domain.  Because  of  the  opacity 
of  copyright  law,  and  the  extension  of 
protections  mandated  by  tlie  1998  act, 
it's  not  always  clear  which  works  are 
stiU  protected.  (Copyright  status  can  be- 
come murky  when  authors  die  or  pub- 
lishing houses  go  out  of  business.)  Stan- 
ford has  drawn  a  line  at  1964  and 
prohibited  Google  from  copying  most 
works  published  since  that  date.  "When 
Google  got  sued,  we  got  nervous,"  Mi- 
chael A.  Keller,  the  university  librarian 
at  Stanford,  told  me.  "We're  not  a  pub- 
lic institution.  We  don't  have  any  state 
immunity  from  being  sued  ourselves,  so 
we  started  sorting  out  the  stuff  that  we 
know  is  public  domain."  (Several  of  the 
public  institutions  that  are  Google's 
partners,  including  the  Universities  of 
Michigan,  California,  Virginia,  and 
Texas  at  Austin,  are  allowing  the  scan- 
ning of  copyrighted  material.) 

The  chief  engineer  of  Google's  sys- 
tem for  scanning  books  in  the  li- 
brary collections  is  Dan  Clancy,  who 
joined  the  company  after  eight  years  at 
NASA,  where  he  supervised  teams  of 
Ph.D.s.  working  on  problems  related  to 
artificial  intelligence.  Google  provides 
its  employees  with  free  food  twenty- 
four  hours  a  day,  and  Clancy,  a  tall, 
shambling  man  with  a  shock  of  white- 


blond  hair,  conducted  most  of  our  con- 
versations with  bits  of  granola  bar  cling- 
ing to  his  shirt. 

"Previously,  when  people  have  done 
scanning,  they  always  were  constrained 
by  their  budget  and  their  scale,"  Clancy 
told  me.  "They  had  to  spend  all  this 
time  figuring  out  which  were  the  perfect 
ten  thousand  books,  so  they  spent  as 
much  time  in  selection  as  in  scanning. 
AH  the  technology  out  there 
developed  solutions  for 
what  rU  call  low-rate  scan- 
ning. There  was  no  need 
for  a  company  to  build  a 
machine  that  could  scan 
thirty  million  books.  Do- 
ing this  project  just  using 
commercial,  off-the-shelf  technology 
was  not  feasible.  So  we  had  to  build  it 
ourselves." 

Google  wdH  not  discuss  its  proprietary 
scanning  technology,  but,  rather  than  in- 
vesting in  page-turning  equipment,  the 
company  employs  people  to  operate  the 
machines,  I  was  told  by  someone  famil- 
iar with  the  process.  "Automatic  page- 
turners  are  optimized  for  a  normal  book, 
but  there  is  no  such  thing  as  a  normal 
book,"  Clancy  said.  "There  is  a  great  deal 
of  variability  over  books  in  a  library,  in 
terms  of  size  or  dust  or  brittle  pages."  (To 
needle  Google,  several  blogs  have  posted 
images  from  the  books  site  that  include 
the  scanners'  fingers.)  Google  will  not 
reveal  how  much  it  is  spending  on  the 
books  project.  In  2005,  Microsoft  an- 
nounced that  it  would  spend  two  and  a 
half  million  dollars  to  scan  a  hundred 
thousand  out-of-copyright  books  in  the 
collection  of  the  British  Library.  At  this 
rate,  scanning  thirty-two  million  books — 
the  number  in  WorldCat's  database — 
would  cost  Google  eight  hundred  mil- 
lion dollars,  a  major  but  hardly  extrava- 
gant expenditure  for  a  multibUlion-dollar 
corporation. 

Copying  all  those  pages  presents 
many  difficulties,  but  writing  software  to 
make  the  books  useful  to  searchers  is 
even  harder.  "The  scanning  technology  is 
boring,"  Clancy  said.  "The  real  challenge 
is  to  get  somebody  something  that  they 
are  actually  interested  in,  inside  a  book. 
Web  sites  are  part  of  a  network,  and 
that's  a  significant  part  of  how  we  rank 
sites  in  our  search — how  much  other 
sites  refer  to  the  others."  But,  he  added, 
"Books  are  not  part  of  a  network.  There 
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is  a  huge  research  challenge,  to  under- 
stand the  relationship  between  books." 

Still,  the  basic  search  protocols  flinc- 
tion  well.  A  search  for  "Heart  of  Dark- 
ness" leads  immediately  to  Joseph  Con- 
rad's novel,  which  is  not  as  obvious  as  it 
sounds,  considering  how  common  the 
words  in  the  tide  are.  As  Clancy  said,  "If 
you  put  in  'Heart  of  Darkness,'  we  have 
to  know  that  you're  looking  for  the  novel, 
not  a  book  about  lighting  conditions  in 
cardiac  surgery.  So  how  do  we  do  that? 
We  rank  some  words  more  important 
than  others.  The  tide  may  matter  more 
than  the  content,  so  we  may  weight  that 
more.  You  could  also  look  at  what  other 
people  have  searched  for,  so  if  everyone 
who  searched  for  'Heart  of  Darkness' 
clicked  on  the  novel,  we  might  figure 
that  you  probably  will,  too." 

The  most  important  data  for  ranking 
searches,  Clancy  explained,  may  come 
from  Web  pages  that  link  to  books  in 
Google's  database.  (For  instance,  if  links 
on  the  phrase  "Clinton's  autobiography" 
direct  users  to  a  copy  of  "My  Life"  on 
the  books  site,  there  is  a  high  probabil- 
ity that  people  who  use  the  same  search 
terms  wiU  also  want  this  result.)  "We 
just  started,  and  we  need  to  make  these 
books  networked,  and  we  need  people 
to  help  us  do  that,"  Clancy  said. 

Google's  database  contains  many 
books  in  languages  other  than  English, 
but  for  now  they  must  be  searched  in 
the  original  tongue.  On  the  company's 
Web  site,  there  is  already  a  primitive 
translation  feature,  and  it  may  someday 
be  enhanced  to  allow  books  to  be  ren- 
dered in  another  language  at  the  touch 
of  a  button.  "In  terms  of  democratiza- 
tion, you  want  to  be  able  to  access  infor- 
mation," Clancy  told  me.  In  places  like 
the  Arab  world,  where  few  titles  are 
translated  into  the  local  languages  each 
year,  he  said,  access  to  the  world's  books 
could  have  a  substantial  impact.  "We 
are  talking  about  a  universal  digital  li- 
brary," Clancy  went  on.  "I  hope  this 
world  evolves  so  that  there  exists  a  time 
where  somebody  sitting  at  a  terminal 
can  access  all  the  world's  information." 

Such  messianism  cannot  obscure  the 
central  truth  about  Google  Book 
Search:  it  is  a  business.  Google  has 
pledged  not  to  show  advertising  next  to 
the  pages  of  library  books,  but  the  com- 
pany does  sell  advertising  alongside 


search  results  that  lead  to  books  obtained 
from  publishers.  Google's  prospects  for 
producing  revenue  from  the  books  proj- 
ect appear  rather  modest,  but  the  com- 
pany has  often  made  a  profit  on  ventures 
that  initially  seemed  unlikely  to  be  lucra- 
tive. "We've  had  this  fortunate  streak  that 
when  we've  done  things  that  have  im- 
pacted our  users  and  society  as  a  whole — 
positively,  in  a  significant  way — ^we've 
been  rewarded  by  that  downstream  in 
some  way,  even  though  we  may  not  have 
envisioned  exactly  what  it  was  right 
offhand,"  Sergey  Brin  told  me.  "We 
didn't  have  ads  when  we  first  put  up  Web 
search.  It  wasn't  dear  it  was  great  busi- 
ness when  we  started  search.  In  fact,  the 
companies  that  were  doing  search  were 
moving  away  from  it.  But  we  just  thought 
it  was  important,  and  we  thought  that 
where  there  was  a  wUl  there  would  be 
a  way.  And  in  fact  it  turned  out  to  be 
a  great  way  to  make  money — doing 
search  with  targeted  advertising.  And  I 
think  you'll  find  the  same  sort  of  thing 
here." 

The  key  legal  question  is  whether  tlie 
courts  wall  allow  Google  to  continue  to 
scan  copyrighted  material  without  per- 
mission. But  the  schedule  of  the  lawsuits 
may  turn  out  to  be  as  significant  as  the 
merits  of  the  cases,  which  are  before  Judge 
John  E.  Sprizzo.  In  keeping  with  the 
stately  pace  of  federal  litigation,  the  depo- 
sitions ofwitnesses  are  to  begin  sometime 
this  year,  and  the  parties  will  be  allowed  to 
file  motions  for  summary  judgment — in 
Google's  case,  to  dismiss  the  suits- — in 
early  2008.  Then  there  could  be  a  trial.  If 
the  cases  are  appealed,  they  could  linger 
well  into  die  next  decade. 

However,  most  people  involved  in  the 
dispute  believe  that  a  setdement  is  likely. 
"The  suits  that  have  been  filed  are  a  busi- 
ness negotiation  that  happens  to  be  go- 
ing on  in  the  courts,"  Marissa  Mayer 
told  me.  "We  think  of  it  as  a  business  ne- 
gotiation that  has  a  large  legal-system 
component  to  it."  According  to  Pat 
Schroeder,  the  former  congresswoman, 
who  is  the  president  of  the  Associa- 
tion of  American  Publishers,  "This  is 
basically  a  business  deal.  Let's  find  a 
way  to  work  this  out.  It  can  be  done. 
Google  can  license  these  rights,  go  to  the 
rights  holder  of  these  books,  and  make 
a  deal." 

The  terms  of  such  a  deal  aren't  hard 
to  imagine.  The  Authors  GuUd  is  con- 


cerned that  pirated  copies  of  the  books 
on  Google's  site  could  leak  to  the  pub- 
lic, and  so  the  organization  would  in- 
sist on  security  measures.  (Sadly,  for 
writers  and  publishers,  demand  for 
their  products  has  never  been  robust 
enough  to  generate  a  major  piracy 
problem.)  As  for  distribution  of  the 
proceeds  from  the  site,  Google  might 
agree  to  share  revenue  with  publishers, 
in  the  way  that  radio  stations  pay  for 
the  music  they  play,  publishers  could 
receive  a  fee  based  on  a  statistical  anal- 
ysis of  how  often  their  books  are  viewed. 
Google  could  pay  in  cash  or  in  kind,  with 
advertising. 

But  a  settlement  that  serves  the  par- 
ties' interests  does  not  necessarily  benefit 
the  public.  "If  s  clearly  in  both  sides'  in- 
terest to  setde,"  Lawrence  Lessig,  a  pro- 
fessor at  Stanford  Law  School,  said. 
"Businesses  in  Internet  time  can't  wait 
around  for  years  for  lawsuits  to  be  re- 
solved. Google  wants  to  be  able  to  get 
this  done,  and  get  permission  to  resume 
scanning  copyrighted  material  at  all  the 
libraries.  For  the  publishers,  if  Google 
gives  them  anything  at  all,  it  creates  a 
practical  precedent,  if  not  a  legal  prece- 
dent, that  no  one  has  the  right  to  scan 
this  material  without  their  consent.  That's 
a  win  for  them.  The  problem  is  that  even 
though  a  setdement  would  be  good  for 
Google  and  good  for  the  publishers,  it 
would  be  bad  for  everyone  else." 

Libraries  have  recognized  for  some 
time  that  they  must  adapt  to  the 
digital  age,  and  many  have  taken  steps 
in  that  direction.  In  1995,  Stanford 
founded  the  HighWire  Press,  which 
now  provides  electronic  access  to  more 
than  a  thousand  scholarly  journals.  A 
few  years  later,  Stanford  digitized  most 
of  its  card  catalogue,  and  circulation  of 
its  books  increased  by  fifty  per  cent. 
"Once  our  students  could  sit  in  their 
dorm  rooms  and  find  out  what  we  had 
in  the  library,  they  sought  out  more 
books,"  Michael  Keller,  the  university 
librarian,  says.  Individual  libraries  some- 
times received  grants  to  scan  specific 
collections — in  2001,  the  New  York 
Public  Library  used  federal  money  to 
digitize  a  substantial  portion  of  the  col- 
lection at  its  Schomburg  Center  for  Re- 
search in  Black  Culture — but  a  compre- 
hensive effort  seemed  inconceivable. 
According  to  Paul  LeClerc,  who  has 
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been  the  president  of  the  New  York 
Public  Library  for  the  past  thirteen 
years,  "For  the  first  decade  of  my  tenure, 
I  was  always  asked,  Weren't  libraries 
going  to  go  online?'  And  I'd  say  of 
course  we  want  to  do  it,  but  it's  not 
going  to  happen,  because  no  one  is 
going  to  give  us  the  money  to  do  it.  No- 
where on  the  horizon  was  that  amount 
of  money  predictable  or  identifiable. 
Then  came  Google.  This  struck  us  as 
being  the  quickest,  the  fastest,  and  the 
most  efficient  way  of  getting  large-scale 
additions  to  our  collections  online  for 
free  use." 

Among  Google's  potential  competi- 
tors in  the  field  of  library  digitization  are 
members  of  the  Open  Content  Alli- 
ance, which  facilitates  various  scanning 
projects  around  the  country  and  over- 
seas. Funded  largely  by  Microsoft  and 
the  Alfred  P.  Sloan  Foundation,  the 
O.C.A.  has  formed  alliances  with  many 
companies  and  institutions,  including 
the  Boston  Public  Library,  the  Ameri- 
can Museum  of  Natural  History,  and 
Johns  Hopkins  University.  For  the  mo- 
ment, though,  the  O.C.A.'s  members 
are  copying  only  material  in  the  public 
domain  (and  works  from  copyright 
owners  who  have  given  explicit  permis- 
sion), which  limits  the  scope  of  the  proj- 
ects substantially. 

Google's  advantage  may  well  be  ce- 
mented if  the  company  settles  its  law- 
suits with  the  publishers  and  authors.  "If 
Google  says  to  the  publishers,  'We'll 
pay,'  that  means  that  everyone  else  who 
wants  to  get  into  this  business  will  have 
to  say,  W^e'U  pay,' "  Lessig  said.  "The 
publishers  will  get  more  than  the  law 
entitles  them  to,  because  Google  needs 
to  get  this  case  behind  it.  And  the  set- 
tlement wiU  create  a  huge  barrier  for  any 
new  entrants  in  this  field." 

In  other  words,  a  settlement  could 
insulate  Google  from  competitors, 
which  would  be  especially  troubling,  be- 
cause the  company  has  already  proved 
that  when  it  comes  to  searches  it  is  not 
infallible.  "Google  didn't  get  video 
search  right — YouTube  did,"  Tim  Wu, 
a  professor  at  Columbia  Law  School, 
said.  (Google  solved  that  problem  by 
buying  YouTube  last  year  for  $1.6  bil- 
lion.) "Google  didn't  get  blog  search 
right — technorati.com  did,"  Wu  went 
on.  "So  maybe  Google  won't  get  book 
search  right.  But  if  they  settle  the  case 
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with  the  publishers  and  create  huge  bar- 
riers to  newcomers  in  the  market  there 
won't  be  any  competition.  That's  the 
greatest  danger  here." 

T'he  most  striking  thing  about  Pa- 
jama  Day  at  Google  was  how  few 
people  participated.  Most  of  the  rank 
and  file  saw  the  stunt  for  the  manufac- 
tured fun  that  it  was.  They  came  to  work 
in  their  usual  slacker  uniforms  of  jeans 
and  T-shirts — ^which  are,  in  their  way,  as 
conformist  as  white  shirts  and  ties  were 
at  I.B.M.  in  the  nineteen-sixties.  Google, 
as  its  employees  seem  to  recognize,  can- 
not pretend  to  be  anything  other  than  a 
large  and  powerful  corporation. 

It's  easy  to  mock  Google's  unofficial 
motto — "Don't  be  evil" — but  there  is 
nothing  evil  about  Google  Book  Search. 
At  the  same  time,  there  is  nothing  in- 
herently virtuous  about  it.  Google  has 
succeeded  because,  on  the  whole,  it  has 
developed  excellent  products;  it's  folly 
to  judge  the  company's  behavior  on 
moral  grounds.  Its  shareholders  cer- 
tainly don't. 

Nor  can  publishers  and  authors,  who 
are  struggling  for  a  way  to  survive  in  a 
new  age,  portray  their  conflict  with  the 


company  as  one  between  good  and  evU. 
The  dual  status  of  several  leading  pub- 
lishers as  both  partner  and  adversary 
to  Google  underscores  their  desperate 
need  to  hedge  their  bets  in  a  digital 
world  that  they  have  yet  to  master.  The 
publishers'  complaint  against  Google 
states  that  "the  Publishers  support  mak- 
ing books  available  in  digital  form  so 
that  those  books  can  be,  among  other 
things,  researched  through  electronic 
means."  That  may  be  true  in  theory,  but 
trade  publishers,  in  particular,  have  been 
slow  to  embrace  new  technology,  espe- 
cially for  out-of-print  books;  Google 
will  almost  certainly  bring  more  atten- 
tion to  these  works  than  their  own  pub- 
lishers have. 

The  law  is  supposed  to  resolve  issues 
like  these — between  seif-interested  par- 
ties with  reasonable  claims  and  legiti- 
mate arguments.  But  the  rules  of  copy- 
right are  so  ambiguous,  and  the  courts 
so  slow,  that  the  judicial  system  serves 
largely  to  implement  the  law  of  the  jun- 
gle. "There  is  a  real  opportunity  to  move 
books  into  the  digital  arena,"  Marissa 
Mayer  told  publishers  during  the  con- 
ference at  the  New  York  Public  Library. 
"And  we  are  going  to  do  it  together."  ♦ 
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An  effort  among  Internet  activists  to  halt  the  extension  of  copyright  protections  for  orphan 
works-out-of-print  books  and  media-was  dealt  a  setback  last  week  by  a  U.S.  appeals  court 
decision. 

The  case,  Kahle  v.  Gonzales,  was  filed  in  2004  by,  among  others,  Internet  Archive  co-founder  and  director  Brewster 
Kahle.  Plaintiffs  argued  that  extending  such  copyrights  harmed  the  public's  ability  to  access  orphan  worl<s.  The 
Internet  Archive  has  been  joined  by  companies  lil<e  Google,  Yahoo  and  Microsoft  in  attempting  to  gain  public  domain 
status  for  these  works. 

But  a  U.S.  district  court  had  already  rejected  the  lawsuit,  and  last  week,  the  Ninth  Circuit  U.S.  Court  of  Appeals  upheld 
the  lower  court's  decision,  saying  that  plaintiffs'  arguments  were  essentially  the  same  as  those  rebuffed  by  the  U.S. 
Supreme  Court  in  2003  in  Eldred  v.  Ashcroft,  which  affirmed  the  constitutionality  of  new  copyright  laws  expanding  the 
protections  for  orphaned  works. 

For  Kahle,  the  ruling  was  a  blow  to  his  goal  of  preserving  as  many  forms  of  media  as  possible  for  posterity.  But  he 
hardly  views  the  result  as  a  final  defeat 

Still,  Kahle  and  the  Internet  Archive  are  also  gaining  momentum,  and  recently  received  a  $1  million  grant  from  the 
Sloan  Foundation  for  the  scanning  of  public  domain  works. 

Recently,  Kahle  visited  CNET's  Second  Life  auditorum  for  a  discussion  in  front  of  an  eager  audience  about  the  case, 
as  well  as  about  the  Internet  Archive,  Nicholas  Negroponte's  $100  laptop  project  and  other  issues. 

Q:  Please  explain  the  mission  of  the  Internet  Archive. 

Kahle:  We're  out  to  help  build  the  Library  of  Alexandria  version  2,  starting  with  humankind's  published  works,  books, 
music,  video,  Web  pages,  software,  and  make  it  available  to  everyone  anywhere  at  anytime,  and  forever.  We  started 
archiving  the  Web  in  1996  with  snapshots  every  two  months  of  all  publicly  accessible  Web  pages.  The  "Wayback 
Machine"  is  now  about  85  billion  pages  and  1.5  petabytes.  Then  we  moved  on  to  books,  music  and  video  We  work  with 
great  lawyers,  the  U.S.  Copyright  office,  the  Library  of  Congress  and  the  American  Library  Association.  We  have 
30,000  movies,  100,000  audio  recordings  and  now  we're  digitizing  books. 

How  do  you  deal  with  the  copyright  issues? 

Kahle:  For  the  Web,  we  followed  the  structure  of  the  search  engines  and  the  opt-out  system  for  doing  the  first-level 
archiving.  If  folks  write  to  us  not  wanting  to  be  archived,  then  we  take  them  out.  For  music,  we  offered  free  unlimited 
storage  and  bandwidth,  forever,  for  the  recording  of  "trader  friendly"  bands  in  the  tradition  of  the  Grateful  Dead. 

We  now  have  over  2,000  bands  and  36,000  concerts.  With  packaged  software,  our  lawyers  told  us  that  digital  rights 
management  (DRM)  would  pose  a  problem  under  the  Digital  Millennium  Copyright  Act  (DMCA),  so  we  got  an  exemption 
from  the  copyright  office  allowing  us  to  hp  software  and  break  the  copy  protection  for  archival  purposes  With  books, 
we  are  starting  with  out-of-copyright  (works)  and  wanting  to  move  to  orphan  works,  then  out-of-print  works,  then  finally 
in-print  (works).  We  digitize  12,000  books  a  month  and  have  100,000  on  the  site  now  for  free  use  and  download.  But 
we  just  had  a  setback.  Larry  Lessig  brought  a  suit  on  our  behalf  Kahle  v.  Gonzales,  to  allow  orphan  works  to  be  on 
digital  library  shelves.  But  the  9th  Circuit  U.S.  Court  of  Appeals  just  rejected  it. 

Can  you  talk  more  about  Kahle  v.  Gonzalez? 
We  digitize  12,000 
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books  a  month  and  Kahle:  Fundamentally,  this  is  an  issue  for  the  Supreme  Court  and  the  Congress, 

have  100  000  on  the         what  kind  of  world  do  we  want  in  the  digital  era?  Do  we  want  to  have  libraries  like  we 
^itp  now  for  fr^p  ii«;p         ^"^^^  ^'^  ^'^'^'  °'^®^  ^'^^  °'^  ^^^  ^^^  books  available  to  those  that  go  to  the  library? 
.        .  Or  do  we  only  want  what  corporations  are  currently  peddling?  Of  course  people  want 

and  download.  the  library,  but  how  do  we  do  that  in  such  a  way  it  does  not  sink  an  industry? 

Libraries  worked  because  they  were  a  pain  to  go  to.  So  instead  of  frequently  going 
to  a  library  for  new  books,  people  went  to  book  stores.  Also,  libraries  spend  $3  to  $4  billion  each  year  on  publishers' 
products.  So  how  do  we  build  a  digital  environment  and  ecology  that  allows  new  works  to  get  created  and  paid  for, 
preserve  them  long-term,  provide  access  to  the  underprivileged,  provide  a  different  kind  of  access  for  scholarship  and 
journalism  and  all  in  the  new  world.  It  is  not  simple.  But  it  is  important. 

Talk  about  book-scanning  projects  currently  going  on. 

Kahle:  There  are  a  couple  of  major  scanning  projects  in  this  country;  Google  is  leading  one,  and  a  large  group  of 
libraries  and  archives  are  working  together  on  another.  Also,  there's  the  Open  Content  Alliance,  which  is  attempting  to 
keep  the  public  domain  public  domain,  so  if  a  book  passes  into  the  public  domain,  the  digital  version  is  not  locked  up 
again  as  a  copyrighted  work.  There  are  other  projects  that  are  putting  perpetual  restrictions  on  what  can  be  done  with 
digitized  public  domain  works. 

That's  a  bit  scary  from  my  point  of  view.  We  need  help  keeping  the  libraries  open  and  unencumbered  by  new 
restrictions  on  public  domain  works.  We  have  been  able  to  scan  books  for  a  total  cost  of  10  cents  a  page,  so  about 
$30  a  book.  And  what  we  really  need  is  more  folks  to  want  this  done  or  want  to  scan  themselves. 

Who  are  the  natural  "enemies"  of  the  Internet  Archive? 

Kahle:  Everyone  seems  to  like  the  idea  of  preservation  of  cultural  materials.  But  folks  are  nervous  about 
disruptions  in  commercial  practices  that  are  just  now  getting  formed.  Libraries  and  publishing,  however, 
have  always  existed  in  parallel.  What  happened  is  that  some  overzealous  copyright  laws  got  passed  with 
heavy  lobbying  from  folks  like  Disney  and  these  are  screwing  things  up.  I  think  of  it  as  collateral  damage. 
Instead  of  keeping  just  Mickey  Mouse  or  just  the  profitable  works  under  copyright  for  longer,  they 
fundamentally  changed  the  structure  of  copyright.  So  the  problem  we  find  mostly  is  not  that  we  are 
stepping  on  toes,  it's  that  we  run  the  risk  of  stepping  on  a  legal  landmine  from  a  previous  war. 

You  have  one  of  the  first  $100  laptops.  What  is  your  take  on  that  project? 

When  I  got  to  hold  the  $100  laptop  in  my  hand,  I  had  one  of  those  experiences  that  does  not  happen  very  often:  This 
is  important.  The  organization  is  nonprofit,  the  goal  is  great,  and  it  has  to  be  open  to  succeed.  It  is  bottom-up,  built  for 
Linux  and  openness.  We  are  a  library  for  the  machine,  so  we  hope  to  have  millions  of  new  users  in  the  coming  year.  I 
am  very  interested  in  the  rise  of  the  technical  nonprofits.  There  are  very  interesting  things  happening  there,  where  the 
new  products  out  of  big  companies  are  getting  more  locked  down  and  closed  all  the  time. 

How  is  the  Wayback  Machine  distributed  around  the  world? 

Kahle:  We  have  our  servers  in  San  Francisco.  What  happens  to  libraries  is  they  are  burned,  and  they  are  usually 
burned  by  governments.  So  we  are  working  to  build  an  "international  library  system"  of  a  few  great  libraries  that  have 
exchange  agreements.  Our  first  was  the  library  of  Alexandria  in  Egypt,  and  they  got  a  full  copy  of  what  we  have  and 
vice  versa.  They  are  scanning  Arabic  books.  We  are  just  starting  to  work  with  the  European  Archive  in  Amsterdam. 
They  have  a  partial  mirror  and  are  looking  for  funding  and  help.  It  is  an  exciting  time  and  scary  time   Hard  drives  fail  all 
the  time,  people  screw  up  and  governments  make  bad  calls. 

Rik  Riel  (from  the  audience)  asks:  What's  your  opinion  on  the  potential 
What  happened  is  that    threats  of  ISPs  throttling  certain  content  (i.e.  violating  Net  neutrality)? 

.  Kahle:  It  is  a  huge  and  important  issue.  A  way  to  frame  it  is  that  in  the  '80s,  the 

COpynghtJaWS  got  ^^^^l^  ^^^  ^^^^  ^^^  "transport  layer."  Basically  ArpaNet/lnternet  vs.  the  phone 

passed  with  heavy  companies.  We,  in  the  open  world,  made  huge  wins,  companies  prospered  and  all 

lobbying  from  folks  sorts  of  things  went  great.  The  battle  in  the  '90s  was  at  the  software  level: 

like  Disney  and  these  browsers,  protocols,  etc.  Basically  it  was  the  open  world  of  the  Web  vs.  the  closed 

are  SCrewina  thinas  worlds  of  AOL  and  Lexls/Nexls.  Again,  we  made  huge  wins  there.  Yes,  the  dominant 
browser  was  closed  source,  but  it  talked  the  open  protocols.  And  the  great  progress 
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up.  I  think  of  it  as  of  Firefox,  Linux  and  Ubuntu  give  reason  for  hope  at  that  layer 

collateral  damage.  ^^^  2000s  is  the  battle  at  the  content  layer:  open  or  closed.  iTunes  is  a  loser  in  this 

view.  DRM,  central  control,  etc.  Google's  restrictions  on  the  books  they  are 
scanning  is  a  loss  on  this  front.  So  we  need  real  help  to  build  an  open  content  layer  that  is  not  centrally  controlled. 
Wikipedia  is  a  great  example  of  a  win.  But  now  we  are  seeing  new  attacks  on  fronts  we  thought  we  won--most 
particularly  the  transport  layer.  The  phone  companies  have  all  gotten  back  together  again  to  make  their  monopoly.  In 
the  ultimate  thumbing  of  the  nose  they  are  calling  it  AT&T.  And  they  are  at  their  old  tactics  again.  So  we  have  to  fight 
like  nuts  to  keep  the  transport  open,  the  software  open  and  the  content  open.  It  is  good  for  the  public  and  it  is  good 
for  businesses.  It  is  just  not  good  for  monopolies,  and  that  is  a  good  thing  in  most  people's  views. 

AlexisJ  Onmura  (from  the  audience)  asks:  Which  new  technology-if  the  Internet  Archive  had  the 
opportunity  to  try-do  you  think  can  do  what  stone  has  done  for  ancient  civilizations  in  terms  of 
longtime  storage? 

Kahle:  You  can  make  a  durable  printout  on  something  like  stone  and  the  like,  but  I  would  like  to  argue  for  something 
else.  If  you  look  at  the  world  as  a  whole  since  writing  started  in  Sumeria,  there  has  been  an  up-and-running  civilization 
somewhere.  So  I  believe  we  can  have  long  term  storage  and  access--which  is  key-by  building  a  set  of  International 
Libraries  in  different  jurisdictions  that  have  active  trade  agreements.  When  one  melts  down,  then  when  they  come  back 
up,  the  others  can  and  are  motivated  to  rebuild  it   If  this  were  in  place,  I  could  sleep    S 
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A  World  Without 
Late  Fees 

Ths  web  is  the  new  public  library, 
but  who's  going  to  own  the  bocks? 

By  Richard  Koman 


SO  YOU'RE  in  school  and  you  have  to  write  i 
research  report  on,  say,  the  anti-slaveiy 
movement  in  the  United  States  in  the 
mid-igthcentuiy.  Where  do  you  go? 


Photograph  by  Fehpe  Buitrago 
Light  Reading:  Brewster  Kahle,  director  of  the  Internet 
Archive,  and  'The  Scribe,'  the  scanner  used  to  get 
public -donnain  wori<s  online 
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To  the  Santa  Clara  Public  Library'  Well,  no 

There  you  would  have  to  pick  through  books  on  slavery,  American  history  and  Lincoln,  hoping  to  find  a  few 

references  and  piece  it  together  If  the  public  library  even  stocked  the  top  books  on  19th-century  America. 

Even  so,  you'd  have  to  order  the  books  from  different  branches,  wait  several  weeks  and  hope  you  ordered  good 

stuff 


Maybe  the  Stanford  Library  But  then 
Any  Weekend  Plans?  Meet  San     you  don't  have  a  Stanford  student  ID  and 
Jose  b'liaieo!  you'd  have  to  sit  m  the  reading  room 

when  you're  supposed  to  be  in  school 

Are  you  kidding''  You'd  search  Google  for 
your  topic  And  you'd  find 

A  Microsoft  Encarta  encyclopedia  entry 

A  Wikipedia  article  on  abolitionism 

Links  to  some  university  book  catalogs 

Assorted  other  hits  that  include 
"anti-slavery"  "movement"  and  "19th 
centuiy"  but  are  basically  off-target 

Hmmm  ...  How  about  takmga  look  at  the      Mvn-.tisw  Ijniis 

James  Bimey  Collection  of 

Anti-Slavery  Pamphlets    a  collection  of 

over  1,000  abolitionist  books, 

pamphlets  and  newspapers  housed  at  the 

Johns  Hopkins  University  Libraries 

Fancy  a  trip  to  Baltimore'  Right  now, 

that's  the  only  way  you'll  get  to  look  at 

them 


San  3ose.com  Real  Estate 

Relocating  to  San  Jose  or  Silicon  Valley?  Let  San  Jose.com 
introduce  you  to  some  expert  area  real  estate  agents. 


But  soon  enough,  the  entire  collection 

will  be  online  as  high-quality  scans.  So  will  the  complete  personal  libraiy  of  John  Adams  (housed  at  the 
Boston  Public  Library),  the  Getty  Research  Institute's  collections  on  art  and  architecture,  the  full  archive  of 
publications  of  the  Metropolitan  Museum  of  Art  and  UC-Berkeley's  extensive  collection  of  texts  from  the 
Gold  Rush 

"Many  people  are  turrung  to  the  net  as  the  public  library,"  says  Brewster  Kahle,  director  of  the  Internet 
Archive  m  San  Francisco  "Unless  the  works  are  available  on  the  Internet,  they  will  be  unavailable  to  the 
next  generation  Our  role  is  to  make  great  materials  available  to  our  children" 

Last  month,  the  Sloan  Foundation  awarded  a  $  1  million  grant  to  the  Internet  Archive  and  the  Open  Content 
Alliance  to  scan  and  put  online  those  classic  materials  from  America's  past.  The  award  is  a  stake  in  the 
ground,  a  flag  that  says  information  should  not  only  be  online,  but  truly  free,  truly  accessible,  no  matter  what 
search  engine  brings  you  to  the  content 

The  ci^abihty  to  digitize  all  recorded  human  knowledge  now  exists  for  the  first  time,  and  it  is  important  that 
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we  seize  this  moment  and  ensure  that  public  works  and  the  public  domain  at  large  remain  in  the  hands  of  the 
public,"  says  Doron  Weber,  the  program  director  of  Public  Understanding  of  Science  and  Technology  at  the 
New  York -based  Sloan  Foundation 


Google  Alternative 


The  project  is  fundamentally  different  from  Google  Books,  an  initiative  the  Mountain  View  search  giant 
launched  in  cooperation  with  Stanford,  Harvard,  the  University  of  Michigan  and  several  other  major 
university  libranes 

For  one  thing  a  m  illion  bucks  is  a  drop  in  the  bucket  compared  to  what  Google  is  spending    upward  of  $  lOO 
million  Google's  footing  the  bill  for  all  of  the  scanning,  and  the  universities  are  giving  Google  access  to 
millions  of  books.  That's  money  schools  like  Stanford  are  happy  not  to  be  spending  themselves 

Google  is  really  putting  only  one  condition  on  the  partner  libraries    the  books  are  only  indexed  by  Google 
That  means  if  you  use  another  search  engine,  you  won't  have  access  to  these  works  Use  Google,  you  get 
access 

That's  a  deal  breaker  for  people  like  Kahle  His  first  company  was  based  on  the  open  source  search  system, 
WAIS,  that  he  developed  in  the  late  igSos  He  sold  WAIS  Inc  to  Amenca  Online  and  a  second  company, 
Alexa,  to  Amazon  com 

Kahle  has  worked  to  make  information  available  online  for  two  decades,  but  the  open  content  movement 
goes  back  even  further  It  began  some  30  years  ago  when  Michael  Hart,  a  professor  at  the  University  of 
Illinois,  launched  Project  Gutenberg,  an  online  collection  of  public  domain  books  available  in  text,  HTML 
and  XML  formats  Hart  started  by  typing  in  texts  like  Ahce  m  Wonderland  and  War  and  Peace  Today  there 
are  20,000  texts  online  at  Project  Gutenberg  (www  gutenberg  org),  which  are  also  hosted  at  the  Internet 
Archive  (archive  org) 

The  Archive  serves  as  a  kind  of  portal  to  a  niunber  of  open  content  efforts,  including  Gutenberg  The  other 
projects  are  not  just  text  renditions  of  books  but  full-color  scans  that  can  be  downloaded  as  PDFs  or  in  a 
highly  compressed  fonnat  called  DjVu  Among  the  efforts:  the  Million  Book  Project,  Microsoft's  book  search, 
scarming  efforts  at  Amencan  and  Canadian  libranes  and  the  Archives'  own  scanning  efforts  There  are  a 
total  of  100,000  public  domain  books  freely  available  for  download  and  pnnting  on  the  Archive  site 

"People  are  deciding  to  go  open,"  says  Kahle  "People  are  interested  in  having  the  public  domain  stay  public 
domain    and  to  do  high-quality  scanning  that  would  be  of  value  to  the  public  and  to  researchere" 

The  books  that  make  up  our  heritage  should  be  available  online,  but  freely  available,  says  Kahle  "We  want  the 
books  available  through  Google,  Yahoo ,  Microsoft,  libranes  The  idea  of  locking  things  down  doesn't  make 
sense  in  this  Internet  age  " 

When  Google  Books  launched,  the  company  said  they  would  scan  books  in  copynght,  as  well  as  public 
domain  books  That  made  publishers  mad.  Really  mad.  In  2005,  publishers  and  authors  sued  Google  and  the 
company  made  some  changes  to  accommodate  those  concerns 

The  Archive  project  will  have  no  such  issues    it's  focused  totally  on  works  that  are  in  the  public  domain. 

'The  first  step  is  public  domain  works,  then  orphan  works,  then  out-of-pnnt  works,  then  in-print  works,"  says 
Kahle  "For  m-pnnt  works,  I  think  we'll  see  publishers  take  a  role  in  distnbuting  their  works" 

Free  the  Orphans 

But  the  orphans  will  be  locked  up  for  a  while  longer 

"Orphan  works"  are  those  that  would  have  entered  the  public  domain  if  it  weren't  for  a  1976  rewnte  of  the 
Copyright  Act  that  made  copynght  registration  optional.  In  2004,  Kahle  and  ephemeral  fibn  collector  Rick 
Prelinger  sued  the  government  to  try  to  "free  the  orphans  " 

But  a  panel  of  the  Ninth  Circuit  Court  of  Appeals  ruled  last  month  that  the  orphan  works  will  stay  where  they 


"Wliat  IS  at  stake  IS  if  bbranes  of  the  future  can  provide  access  to  out-of-pnnt  matenals  after  the  publishers 
and  authors  are  gone,"  says  Kalile  'Tins  case  had  only  one  purpose  to  get  the  judge  to  say  tliat  the  structure  of 
copynght  had  clianged  so  we  can  get  the  law  examined,  and  he  did  not  seem  to  even  answer  the  question 
Very  sad  Another  opportumty  missed  by  our  government  Sometimes  I  think  some  of  the  more  senior  judges 
haven't  bothered  to  understand  what  is  happening  to  our  civic  institutions  in  our  digital  age  " 

Perh£?)s  a  little  copyright  histoiy  is  in  order 

For  almost  200  years    from  the  adoption  of  the  Constitution  in  1789  until  the  bicentermial  in  1976    you  had 
to  register  a  copynght,  which  lasted  for  a  certain  number  of  years,  and  then  renew  it  If  your  work  no  longer 
had  commercial  value,  you  wouldn't  renew  it  and  it  would  enter  the  public  domain 

This  arrangement  suited  the  Founding  Fathers  well,  as  American  printers  wanted  to  publish  European  works 
and  there  weren't  a  lot  of  Amencans  creating  their  own  work. 


The  rules  changed  in  1976  with  a  mammoth  rewnte  of  the  Copyright  Act.  The  intent  was  to  bring  the  United 
States  into  compliance  with  the  Berne  Convention,  the  1971  international  accord  on  copynght  issues,  and 
the  new  law  did  away  with  the  registration  and  renewal  requirement  Now  work  is  copynght  upon 
creation    you  don't  even  have  to  publish  or  print  the  (c)  symbol 


Ads  by  Google 


Local  homes  for 

sale 

f-|:":-';  ■.<■)  Moillllrtill 
for  vuls;.  I'v1-:(ps, 


E-Commerce  San 


San  Jose 
Cdlilornia  Mis 


ifun.Lslli, 


„-u,-,l  uMv 


http://www.inetroactive.com/metio/02.07.07/public-libraries-0706.html 


News  &  Culture  in  CA  I  Public  libraries 


3  of  3 


They  can't  be  scanned  or  published  online  or  used  in  denvative  works  until  their  copyright  expires  And 
copyright  now  lasts  a  very  long  time  a  1998  law  named  after  Sonny  Bono  extended  copyright  to  the  lifetime 
of  the  author  plus  70  years 

So  Kahie  and  Prelinger  filed  suit,  hoping  that  the  courts  would  order  the  Copynght  Office  to  remove  copyright 
protection  from  these  works 


"What  we  are  allowed  to  make  available'  I  can't  tell  you  how  many  books  are  caught  under  this,  but  it's  a  huge 
number  They're  not  in  pnnt,  not  available  commercially,  but  under  copyright,"  Kahle  says 

In  rejecting  the  Archive's  request,  the  Ninth  Circuit  judges  said  that  Kahle  and  Prelinger  were  essentially 
complaining  that  copyright  was  too  long    the  same  argument  that  was  made  and  rejected  in  the  U  S  Supreme 
Court  case  ofEldred  u  Ashcroft  In  that  case,  publisher  Enc  Eldred  argued  that  by  creating  such  veiy  long 
copynght  tenns  went  against  the  Constitution's  language  that  copynghts  are  to  be  for  "limited  tenns" 


The  Ninth  Circuit  wrote:  'The  outer  boundary  of 'limited  times'  is  determined  by  weighmgthe  impetus 
provided  to  authors  by  longer  terms  against  the  benefit  provided  to  the  public  by  shorter  terms  That  weighing 
is  left  to  Congress,  subject  to  rationality  review  " 

Chris  Sprigman,  the  lead  lawyer  in  the  Kahle  case,  wrote  on  his  blog  that  he  was  "maddened"  by  the  Appeals 
Court's  refusal  to  take  on  a  key  aspect  of  the  Supreme  Court's  Eldred  decision    that  unless  changes  to  the 
copyright  laws  "alter  the  traditional  contours"  of  copynght  protection,  they  don't  offend  the  First 
Amendment 

"By  implication,  when  Congress  does  alter  the  traditional  contours  of  copyright  protection ..  the  changes  to 
the  law  should  be  subject  to  heightened  scrutiny  under  the  First  Amendment  to  determine  whether  they 
impermissibly  burden  speech,"  wrote  Spngman 


Sprigman,  Kahle  and  Prelinger  are  appealing  the  de 
Appeals 


I  for  review  by  the  full  Ninth  Circuit  Court  of 


Kahle  wants  the  court  to  clanfy  that  groups  like  the  Internet  Archive  can  make  out-of-pnnt  works  available  0 
the  Internet 

"Otherwise  we  live  in  a  world  of  just  very  old  works  in  the  public  domain  and  commercially  available  works. 
Everything  in  between  effectively  will  be  denied  the  next  generation,"  he  says  "We  could  lose  the  20th 
century" 


Senna  letter  to  the  editor  about  this  stoiy 
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Searching  E-phemera 

INTERNET  |  Want  to  find  that  cool  Web  page  again?  Look  to  this  Net  archive. 

By  John  Wenzel 
Denver  Post  Staff  Writer 
DenverPost.com 

Article  Last  Updated:  12/25/2006  06:38:40  PM  MST 


The  Library  of  Alexandria,  that  great  Egyptian  epicenter  of  learning,  fell  prey  to  numerous  fires  and  political  agendas  about  2,000  years  ago. 

It's  fitting,  then,  that  the  library  was  both  an  inspiration  and  a  cautionary  tale  for  the  Internet  Archive,  a  wildly  ambitious  project  aimed  at 
cataloging  every  pixel  of  the  Internet. 

Sound  impossible?  It  is,  technically,  since  the  Internet's  vast  lands  can  never  be  mapped  100  percent.  But  digitally  collecting  the  most 
significant  (and  insignificant)  websites  is  more  than  just  a  hobby,  it's  an  essential  preservation  project. 

"The  average  life  of  a  Web  page  is  100  days,"  said  Internet  Archive  founder  Brewster  Kahle.  "If  we  don't  aggressively  go  and  archive  these 
materials  that  we're  depending  on,  not  just  for  scholarship  but  for  cultural  fun  and  trivial  pursuits,  they  will  be  gone  forever." 

Kahle's  San  Francisco-based  nonprofit,  funded  by  private  and  public  institutions  such  as  the  Library  of  Congress,  is  celebrating  its  10th  year  and 
recent  legal  successes.  Its  goal  is  simple:  to  use  technology  to  democratize  access  to  cultural  material. 

The  free,  publicly  accessible  site  at  Archive.org  is  the  world's  largest.  It  has  collected  more  than  85  billion  Web  pages  since  it  started  in  1996. 
Searching  through  the  sites  shows  just  how  far  the  medium  has  really  come. 

Blocky,  simplistic  logos  and  clunky  hyperlinks  adorn  even  the  highest-end  corporate  websites,  from  McDonald's  to  General  Motors.  Forget 
about  streaming  audio  or  video:  Websites  10  years  ago  could  barely  coax  low-res  photos  from  a  dial-up. 

Of  course,  the  web  has  grown  exponentially  since  then,  touching  nearly  every  corner  of  commerce  and  culture.  It  has  grown  so  large,  in  fact, 
that  the  Internet  Archive  measures  its  collection  in  petabytes.  They  boast  1.5  petabytes,  or  1.5  million  gigabytes,  of  information  on  2,300 
different  servers.  That's  enough  to  fill  50,000  of  the  30  gig  iPods  -  products  with  more  memory  than  most  commercial  computers  had  five  years 
ago. 

"We  just  moved  to  a  new  data  center  that's  twice  as  big  and  twice  as  powerful,"  said  Kristine  Hanna,  director  of  web  archiving  services.  "That 
may  not  seem  very  sexy,  but  for  a  company  with  this  much  data,  it's  pretty  special." 

Hanna,  founder  of  Girl  Geeks.org  and  a  former  producer  at  Lucasfilm  and  Warner  Bros.,  said  backing  up  the  Internet  Archive  is  essential.  Their 
offices  in  the  Presidio  cannot  contain  the  multiple  copies  they  require. 

"We  have  copies  here  in  San  Francisco,  and  at  partial  sites  in  Paris  and  Amsterdam  and  the  Biblioteca  Alexandria  in  Egypt,"  said  Hanna.  "We 
put  it  on  discs,  then  store  them  on  these  6-foot-tall,  2-foot-wide  racks  that  each  weigh  about  a  ton.  Then  we  ship  out  about  eight-10  of  these  on 
boats  and  airplanes  to  the  respective  countries." 

Using  the  Wayback  Machine  search  engine  at  Archive.org  is  a  surprisingly  comprehensive  experience.  Even  obscure  personal  webzines  from 
the  late  '90s  have  survived  on  their  servers,  which  automatically  update  every  two  months  using  a  specialized  spider,  or  web  crawler. 

"I  grew  up  in  the  '70s  and  '80s  and  there  was  this  reigning  mythology  that  people  would  only  do  things  if  you  paid  them  to,"  said  founder  Kahle. 
"But  there  are  50  million  websites  out  there  now,  and  well  over  49  million  of  them  have  no  commercial  content  at  all.  People  just  love  to 
share." 

Kahle  said  the  web  has  not  progressed  as  quickly  as  he  wanted  -  he  expected  to  see  all  the  world's  books,  video  and  music  online  by  now  -  but 
that  it  still  remains  a  revolutionary  medium. 

"The  human  part  of  me  looks  around  on  the  'Net  sometimes  and  thinks,  'This  is  cooll'  We  have  collectively  built  up  a  resource  that's 
mind-blowing  compared  with  the  library  I'd  slog  down  to  as  a  kid." 

The  Internet  Archive  is  just  one  element  of  a  larger  project  that  also  collects  moving  images,  scanned  books,  software  programs  and  audio, 
numbering  in  the  hundreds  of  thousands  in  total.  Its  noncommercial  aim  is  the  same  as  the  Archive's:  to  provide  free,  open  source  content  to 
anyone  with  a  computer.  A  "dark,"  or  private,  television  archive  is  also  in  the  works  with  news  and  programming  from  dozens  of  international 
TV  stations.  But  intellectual  property  concerns  constantly  pop  up. 

"We're  trying  to  play  a  role  in  the  policy  area,"  said  Kahle,  who  asserts  the  project  has  no  agenda  other  than  to  provide  free  content  and  to 
respect  copyrights.  Just  last  month  his  Archive  won  an  exemption  from  the  Digital  Millennium  Copyright  Act  of  2000. 

"It's  going  to  require  vigilance  because  every  five  years  there's  a  new  attack  on  the  Internet  -  the  open  nature  of  it,  phone  companies  vs.  the 
Internet,  AOL  vs.  the  larger  Web,"  Kahle  said.  "The  current  drama  is  playing  out  at  the  content  layer,  and  it  looks  like  Google  is  really  making  a 


http://www. denverpost.com/portlety  aiticle/html/fragments/print_article.jsp'?articleld=4898746&siteld=36 


Preserving  the  Internet  (News)  Mary  CRegan 


lof2 


in-NEi 


mx  UTNi    MAOAZtNE    &ubs<:rjptioni    ^mrnAa      shop 


^^.xN'*'    Th<^k«*9^**«^il 


February  12,  2007 


MAGAZINE 

Cwrrsrtt  .tfts«s 
Tab:e  of  Contents 

W«fc  Watch 
Web  Spcd-sis 
!Jraw<»e  Archives 
S«3!Ch  Archives 


Currtfst  issue 


four  Accown^: 


Preserving  the  Internet 

How  archiving  online  content  can  make  history 

—By  Mary  O'Regan,  Utne.com 

February  8,  2007  Issue 

Many  of  us  can't  remember  the  last  time  we  put  pen  to  paper  and  wrote 
out  a  letter.  It's  so  much  easier  to  type  a  quick  email,  zip  it  through  a 
maze  of  cables  and  wires,  and  have  it  delivered  to  the  recipient's  inbox 
within  minutes.  Virtual  mail  has  become  the  preferred  medium  for  nearly 
all  occupations.  But  according  to  ■t':^'i!'i]^:^j:\_^iy^::;:i]'^:::l:^y}^^ 
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little  piece  of  history 


Cj_eg_?^,  each  time  you  click  the  "send"  button, 
might  be  lost. 

Imagine  twenty,  fifty,  even  two  hundred  years  from  now.  What  will 
become  of  all  of  the  emails,  websites,  and  web-based  information  that  we 
currently  take  for  granted?  Will  the  great  essays,  scientific  discoveries, 
and  original  art  born  on  the  internet  be  lost  forever  simply  because  a 
server  shut  down?  Unless  historians  think  beyond  paper  and  come  up 
with  a  way  to  preserve  our  digital  footprints,  that's  a  lot  of  ideas  that 
could  end  up  In  the  garbage  Icon. 

According  to  Crease,  advancements  in  communication  technology  "are 
good  for  scientists,  encouraging  rapid  communication  and  stripping  out 
hierarchies."  But  the  latest  fear  among  historians  --  who  rely  on  such 
correspondence  to  chronicle  scientific  developments,  reactions  to  them, 
and  to  better  understand  scientists  themselves  --  is  "whether  email  and 
other  electronic  data  will  be  preserved  at  alt."  Luckily,  there  are  several 
organizations  taking  steps  to  ensure  that  all  hope  —  and  information  --  Is 
not  lost. 

As  reported  by  Crease,  historians  at  the  f-.-n-j'-iCaf-  ::i5':cljIC;  of  Piivsics 
charged  with  tracking  the  history  of  physics  m  industry  have  discovered 
some  interesting  findings  regarding  the  influence  of  communication 
technology  on  science.  Among  them:  "PowerPoint...  can  stultify  scientific 
discussion  and  make  it  less  free-wheeling;  information  also  tends  to  be 
dumbed  down  when  scientists  submit  PowerPoint  presentations  in  place  of 
formal  reports." 

Working  on  the  preservation  front,  Crease  also  notes  that  the  Stanford 
Linear  Accelerator  Center  has  collaborated  with  several  Institutions  to 
create  the  Persistent  Archives  Testbed  (PAT)  project.  Launched  in  2003, 
the  project  has  brought  together  researchers  to  test  a  model  for 
electronic  records  management.  As  explained  on  •/A'^'s  weba^tc-,  should 
the  research  prove  successful,  the  system  would  introduce  a 
"cost-effective  application  and  architecture  for  preserving  electronic 
records."  Translation:   Important  discoveries  found  or  discussed  via  email 
could  be  saved  from  the  vacuum  of  cyberspace. 


Of  course,  scientists  aren't  the  only  ones  with  data  worth  protecting.  In 
San  Francisco,  the  nonprofit  j.Q-:^i;fjfeL^r/:!'J:X^  '^  building  an  internet 
library  --  a  kind  of  Library  of  Alexandria  for  the  21st  century.  The 
organization's  website  states  that,  "If  libraries  are  to  continue  to  foster 
education  and  scholarship  m  this  era  of  digital  technology,  it's  essential 
for  them  to  extend  those  functions  into  the  digital  world."  In  collaboration 
with  institutions  such  as  the  Library  of  Congress  and  the  Smithsonian,  the 
Internet  Archive  seeks  to  trace  the  evolution  of  the  internet,  track 
language  changes,  revive  dead  links,  establish  international  internet 
centers,  and  exercise  our  "right  to  remember."  A  search  feature  on  the 
site  called  the  Wayback  Machine  allows  users  to  browse  through  85 
billion  web  pages  to  find  pages  from  as  early  as  1996,  including  those 
that  no  longer  exist.  Simply  type  in  the  URL  of  the  site  you're  looking  for 
and  a  list,  sorted  by  year,  pops  up  with  links  to  each  former  incarnation. 

Go  there  >  >  The  Lost  Art  pj  tM  .Lggg--: 

Go  there,  too  >>  '■hgJinarrifi?_^g^dVL'g 
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play  for  locking  it  up.  They're  digitizing  great  libraries  and  trying  to  be  the  only  place  you  can  go  to  for  them." 

The  Internet  Archive  only  will  become  more  relevant  as  fewer  people  visit  physical  libraries.  The  Denver  Public  Library's  figures  show  that 
cardholding  members  dropped  by  nearly  75,000  from  2003-2005,  even  as  the  city's  population  increased. 

But  archiving  the  Web  in  general  is  just  as  important  as  keeping  it  open  to  the  public. 

"People  don't  quite  understand  that  if  it's  on  the  Internet,  it's  not  there  forever,"  Hanna  said. 

Archive,  by  the  numbers 

The  Internet  Archive  at  Archive.org  boasts  more  than  a  few  impressive  numbers.  Here's  a  sampling: 

60  days:  How  often  the  archive  collects  a  broad  snapshot  of  the  web. 
100  days:  How  often,  on  average,  web  pages  are  changed  or  deleted. 
10  years:  The  amount  of  time  the  archive  has  been  operating. 

300:  The  amount  of  requests  the  Wayback  Machine  search  engine  receives  per  second. 

2,300:  The  number  of  servers  used  at  the  archive's  San  Francisco  location.  Each  holds  about  a  terabyte,  or  1,024  gigabytes,  of  information. 

50,000:  The  number  of  30  gigabyte  iPods  needed  to  store  the  archive's  collected  information. 

90,000:  The  number  of  audio  recordings,  concerts  and  lectures  on  the  nonweb  archive. 

1.5  million:  Total  amount  of  memory,  in  gigabtyes,  on  Archive. org's  web  project. 

61  million:  The  number  of  unique  pages  in  the  archive's  Hurricane  Katrina  collection,  all  text  searchable,  from  over  1,700  different  sites. 
85  billion:  The  number  of  pages  on  the  current  archive.  That's  a  13-page  website  for  each  man,  woman  and  child  on  Earth. 

-  John  Wenzel 
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When  a  Web  site  dies,  it  goes  to  Web  heaven 
-  Archive.org. 

The  Intemet  generally  has  complete 
disregard  for  keeping  records  or  charting  its 
past.  Once  a  Web  site  it  gone,  it  seems  to 
disappear  into  the  digital  ether. 

A  message  on  Archive.org  says  that  with  the 
large  amount  of  public  records  and 
information  moving  online,  Internet  libraries 
have  become  necessary  to  maintain  the 
public's  "right  to  remember":  "The  Intemet 
Archive  is  working  to  prevent  the  Intemet  -  a 
new  medium  with  major  historical 
significance  -  and  other  'born-digital' 
materials  from  disappearing  into  the  past." 

The  nonprofit  site,  which  has  logged  more 
than  55  billion  Web  pages,  was  founded  by 
Brewster  Kahle  in  1996  and  collaborates  with 
the  Library  of  Congress  and  the  Smithsonian. 
It  doesn't  index  sites  that  are 
password-protected  or  blocked  to  the  public. 
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Though  the  amount  of  text  recorded  by 
Archive.org  is  greater  than  the  collection  of 
the  Library  of  Congress,  the  site  also  stores  live  music  (especially  Grateful 
Dead  shows),  old  video  and  movies  that  have  passed  into  public  domain,  like 
1922's"Nosferatu"  and  the  1949  noir  "D.O.A." 

The  legal  admissibility  in  court  of  old  sites  saved  by  Archive.org  is  an 
interesting  legal  battle  likely  to  emerge  more  frequently.  Earlier  this  year,  a 
health  care  company  named  Healthcare  Advocates  Inc.  sued  Archive.org  after 
it  lost  a  2003  case  that  turned  on  the  evidence  of  Healthcare  Advocates'  old 
Web  site  (saved  on  Archive.org). 

Archive.org  isn't  the  only  site  trying  to  save  the  ever-changing  Web. 
Wikipedia.com,  which  constantly  regurgitates  itself  with  user-inputted  data,  is 
now  being  watched  by  wikidumper.blogsot.com. 

Any  information  not  truthful  enough  to  make  it  into  Wikipedia  is  probably 
dubious  twice  over,  but  Wikidumper  helps  provide  some  oversight  to  the 
editors  of  Wikipedia,  who  can  take  down  an  entry  for  any  number  of  reasons. 

All  of  which  goes  to  show,  be  careful  what  you  type  and  publish  -  it  might  be 
out  there  forever. 
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When  Bank  of  America  (nyse:  BAG  -  news  -  people  )  merged  with  MBNA 
last  year,  Manhattan  banking  center  manager  Ethan  Chandler  was  stoked. 
Chandler,  with  his  co-worker  Jim  Debois  on  guitar,  adopted  U2's  "One"  to  the 
circumstance  -the  chorus  now  reading.  "We  are  one  bank."  The  earnest 
performance  -  which  looks  like  a  real-life  episode  of  "The  Office"  -  has 
become  a  minor  online  sensation,  and  recently  spawned  an  homage  from 
comedian  David  Cross  (which  is  also  available  on  YouTube). 

DOWNLOAD  THIS:  "Cowgirl  in  the  Sand,"  Neil  Young 

Neil  Young  recently  released  "Live  at  the  Fillmore  East,"  a  six-song  disc 
recorded  on  March  6-7, 1 971 ,  at  the  famed  New  York  venue.  Of  the  six  songs 
on  the  album,  though,  two  aren't  available  for  download:  "Down  by  the  River" 
and  "Cowgirl  in  the  Sand,"  each  of  which  runs  more  than  1 2  minutes.  They  are 
both  jaw-dropping  performances,  in  particular  "Cowgirl,"  which  features 
blistering  guitar  interplay  between  Young  and  Crazy  Horse's  Danny  Whitten, 
who  died  of  a  heroin  overdose  in  1972.  On  a  jukebox  in  a  bar,  every  song  costs 
the  same  -  whether  it's  3  minutes  or  16.  Unfortunately,  that's  often  not  the  case 
for  digital  jukes  like  iTunes.  If  they  have  to  double  the  song's  price,  so  be  it,  but 
keeping  iTunes  clear  of  long  songs  is  a  bad  precedent  to  set. 

EDITOR'S  NOTE  -  What's  your  favorite  Web  site?  E-mail  AP  Entertainment 
Writer  Jake  Coyle  at  fcoyle(at)ap.org 

Copyright  2006  Associated  Press.  All  rights  reserved.  This  material  may  not 
be  published  broadcast,  rewritten,  or  redistributed 
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Monday,  September  10,  2007 

The  Internet's  Wayback  Machine 

Web  History  101 

archive. org 

By  Dennis  Fisher 

The  Internet,  by  design,  is  a  dynamic,  ephemeral 
thing.  But  the  folks  at  the  Internet  Archive  are  doing 
their  best  to  put  it  into  suspended  animation. 
Conceived  in  1996  as  a  v^/ay  to  create  a  record  of 
interesting  sites,  the  Internet  Archive  has  expanded 
to  become  a  massive  storehouse  of  data,  music, 
moving  images,  and  now  books. 

"We  wanted  to  build  the  Library  of  Alexandria,  version 
two.  We  have  the  opportunity  to  have  all  of  the 
creative  works  of  humankind  in  one  place,"  says 
Brewster  Kahle,  one  of  the  founders  of  the  project. 
"It's  the  great  opportunity  of  our  lifetime." 

The  collection  includes  an  enormous  live  music 
section,  comprising  upward  of  40,000  shows  from 
bands  such  as  Robert  Randolph  and  the  Family 
Band,  OAR,  the  Grateful  Dead  (below,  with  Bob 
Dylan  from  1987),  and  others,  all  of  which  can  be 
downloaded  --  legally  and  for  free. 

But  the  heart  of  the  project  is  the  Archive's  Wayback 
Machine,  a  kind  of  search  engine  that  can  take  you 
back  to  eBay's  home  page  from  1999  (get  your  Y2K 
supplies  here!)  or  let  you  relive  the  2004  World  Series 

on  Boston.com  (pigs  can  fly,  hell  is  frozen the 

Red  Sox  have  won  the  World  Series).  There  are  more 
than  50  million  sites  and  4  billion  pages  in  the 
Internet  Archive,  including  a  number  of 
shiver-inducing  collections.  The  Sept.  11  archive 


http://www.boston.com/ae/sidekickyinsidekicky2007/09/the_internets_w.html 


The  Internet's  Wayback  Machine  -  InSidekick  -  Boston.com 


2  of  2 


chronicles  the  horror  of  that  day  through  pages  from 
The  New  York  Times,  Boston  Globe,  and  other  sites, 
as  well  as  hundreds  of  hours  of  video  from  news 
programs  in  the  days  surrounding  the  attacks. 

And  don't  miss  the  Prelinger  Archives,  a  gold  mine  of 
kitschy  government  and  corporate  promotional  and 
"educational"  films  from  the  1920s  through  the 
1980s.  Where  else  can  you  see  a  dozen  California 
homemakers  making  the  world's  largest  omelet  to 
showcase  the  quality  of  Sonoma  County  eggs? 
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network. 

Key  suppliers:  Capricorn  Technologies 

This  San  Francisco-based  organization  is  faced  with  the  Herculean  chore  of  recording  all  the  Web  pages  ever 
produced,  a  mind-boggling  task  that  requires  some  serious  storage  backbone. 
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"We're  also  digitizing  hundreds  of  thousands 
of  books,"  explains  Brewster  Kahle,  the 
Internet  Archive's  founder,  adding  that  the 
organization  is  looking  to  store  music  and 
video  data  as  well. 

'There  are  a  couple  of  other  companies  that 
do  things  on  this  scale  ~  Google  and  the 
hotmail  application,  but  there  are  not  many 
more  that  I  am  aware  of." 

Like  -:,  the  Internet  Archive  is  also 

harnessing  the  power  of  thousands  of  Linux 
machines. 

Founded  in  1996,  the  Internet  Archive  is 
essentially  a  massive  digital  library  and 
currently  contains  around  2  Pbytes  of  data  in 
its  storage  network.  This  network  consists  of 
around  2,000  Linux  boxes  from  Capricorn 
Technologies,  linked  via  Ethernet,  each  of 
which  contains  around  4  Tbytes. 

These  "Petaboxes,"  as  they  are  called, 
enable  the  Internet  Archive  to  quickly  scale 
up  its  operation  without  breaking  the  bank, 
according  to  Kahle:  "It's  low-cost,  easy  to 
maintain,  low  power,  and  high  density." 

Archived  Web-page  data  is  accessed  via  the 
Internet  using  a  portal  called  "Wayback"  built 
by  Kahle  and  his  team.  The  name  recalls  a 
segment  of  the  Rocky  &  Bullwinkle  cartoon 
show  of  the  1960s. 

For  the  Archive's  future  storage  needs,  Kahle 
tells  Byte  and  Switch  that  he  is  looking  at 
Sun's  recently  launched  ,  a 

mobile  data  center  shipped  in  a  container  to 
users'  sites. 
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'This  is  the  first  major  step  forward  I  have 
seen  that  would  be  interesting  to  someone  on 
the  scale  of  the  Internet  Archive,"  he  says. 
"You  can  put  three  Petabytes  in  a  shipping 
container.  That,  we  think,  is  a  real  contender." 

Have  a  comment  on  this  story?  Please  click 
"Discuss"  below.  If  you'd  like  to  contact  Byte 
and  Switcti's  editors  directly,  send  us  a 
message. 
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